Alfresco as Extranet

In a couple of projects I’ve worked on we’ve been using Alfresco as an extranet – that’s to say we’ve given external people access to our Alfresco instance so that we can collaborate by sharing documents and using the other site functions like discussion lists and wikis.

We’ve also had these Alfresco instances integrated into a wider single sign on system.

We want people to be able to self register into the SSO system, for a number of reasons.

This has lead to a couple of problems.

Firstly we don’t want somebody to be able to self register and then log into Alfresco and collect our user list by doing a people search.
Secondly we’d like to be able to restrict who can log into Alfresco but give a helpful message if they’ve authenticated successfully.
Thirdly we want to restrict site creation.

Restricting site creation

I’ll cover this first because it’s quite straightforward and documented elsewhere.

There are two parts to the problem:
1) Blocking access to the api
2) Removing the menu option in the UI.

Part 1 can be done by modifying the appropriate bean from public-services-security-context.xml
Part 2 will depend on your version of Alfresco and is adequately covered elsewhere.

Restricting access to the user list

This has come up a few times

It’s even in the To Do section on the wiki

Simple approach

The simplest approach is to change the permissions on /sys:system/sys:people
You can do this by finding the nodeRef using the Node Browser and going to: share/page/manage-permissions?nodeRef=xxxx

You’ll need to create a group of all your users and give them read permission, replacing the EVERYONE permission.

You could get carried away with this by changing the permissions on individual users but that’s not a great idea.

More complex approach

A more complex approach is to use ACLs in a similar fashion to the approach used to block site creation however this does require some custom code and still isn’t perfect.

There are some changes required to make this work nicely above and beyond creating the custom ACL code

In org.alfresco.repo.jscript.People.getPeopleImpl if getPeopleImplSearch is used then
1) it’s not using PersonService.getPeople
2) if it’s using FTS and afterwards PersonService.getPerson throws a AccessDeniedException then it will cause an error (which in the case of an exception will fall through thereby giving the desired result but not in a good way as the more complex search capabilities will be lost)
This, I think, would be a relatively simple change although I’m not sure whether to catch an exception or use the getPersonOrNull method and ignore the null – I’m going with the later

// FTS
  List personRefs = getPeopleImplSearch(filter, pagingRequest, sortBy, sortAsc);

  if (personRefs != null)
    persons = new ArrayList(personRefs.size());
    for (NodeRef personRef : personRefs)
      Person p = personService.getPersonOrNull(personRef);
      if (p != null) {

The usernamePropertiesDecorator bean ( will throw an exception if the access to the person bean is denied – this will have a major impact so we need to replace this with a custom implementation that swallows the exception and outputs something sensible instead.

I’ve logged an issue to get these fixes made.

Oddities that don’t appear break things

The user profile page /share/page/user/xxxx/profile will show your own profile if you try and access a profile that you don’t have access for – strange but relatively harmless
The relevant exceptions are:

There are numerous places where the user name will be shown instead of the actual name if permission is denied to access the actual person record, i.e. it’s not using the usernamePropertiesDecorator, this appears to be done via Alfresco.util.userProfileLink. While far from ideal this isn’t too bad as this information will only be shown if you have access to a node when you don’t have access to the creator/modifier information e.g. a shared document.

Other approaches

It looks like there are a few ways to go about doing this…

The forum posts listed discuss(sketchily!) modifying the client side java script and the webscripts

At the lowest level you could modify the PersonService and change the way that the database is queried but that seems too low level

config/alfresco/ibatis/alfresco-SqlMapConfig.xml defines queries
config/alfresco/ibatis/org.hibernate.dialect.Dialect/query-people-common-SqlMap.xml defines alfresco.query.people
which is used in
which in turn is used by

Restricting access

As this seems to have come up a few times as well…

I’m trying to work out if it’s possible to disable some external users.

My scenario is that I have SSO and LDAP enabled but I only want users who are members of a site to be able to access Share – ideally I’d like to be able to send other users to a static page where they be shown some information. At the moment if you attempt to access a page for which you don’t have access,e.g. share/page/console/admin-console you will go to the Share log in page (which you wouldn’t otherwise see)

I still want all the users sync’ed so using a filter to restrict the LDAP sync isn’t an option.
I only want to restrict access to Alfresco so fully disabling the account isn’t an option.

It’s relatively easy to identify the users and apply the cm:personDisabled aspect but this doesn’t appear to do anything.
See this issue.

I think the reason that the aspect doesn’t work is that the isAuthenticationMutable will return false and therefore the aspect is not being checked.

I can see the idea of not changing the sync’ed users – otherwise a full resync will lose changes
I can also see not wanting to allow updates to LDAP although the case for that is perhaps weaker

However given that it’s possible to edit profiles under these circumstances, e.g. for telephone number, wouldn’t it make more sense for the cm:personDisabled to be treated along with the Alfresco specific attributes and therefore editable rather than with the LDAP specific attributes and therefore not editable?
Actually applicable is probably a better word rather than editable as it’s possible to apply the aspect programmatically – it just doesn’t do anything.

I did think about checking some field in LDAP but I don’t think that would work without getting into custom schemas (not a terrible idea but not a great one either)

So going back to my earlier requirement to show a page to users who don’t have the requisite permission I came up with the following approach:

  • Use a cron based action to add all site users to a group all_site_users
  • Use evaluators to check if user is a member of all_site_users and if not then:
    1. Hide the title bar and top menu
    2. Hide dashboard dashlets
    3. Show a page of text

Further adventures with CAS and Alfresco (and LDAP)

Like Alfresco in the cloud and myriad other systems we’ve decided to use the email address as the user name for logging in. This works fine until you want to allow the user to be able to change their email.

The problem here is that Alfresco doesn’t support changing user names (I believe that it can be done with some database hacking but not recommended)

My solution here is to allow logging in via CAS to use the mail attribute as the user name but to pass the uid to Alfresco to use as the Alfresco user name while this means that the Alfresco user name is not the same as they’ve used to log in, it does allow you to change the mail attribute and as the user name isn’t often visible this works quite well – actually it’s not too bad to set the uid as the mail address especially if the rate of change is low although there are some situations where this is potentially confusing.

So how to do it…

First configure CAS (I’m using 4.0_RC2 at the moment)

In your deployerConfigContext.xml find your registeredServices and add

 <property name="usernameAttribute" value="uid"/>

so you end up with something like this:

<bean class="" p:id="0"
	p:name="HTTP and IMAP" p:description="Allows HTTP(S) and IMAP(S) protocols"
	p:serviceId="^(https?|imaps?)://*" p:evaluationOrder="10000001">
    <property name="usernameAttribute" value="uid"/>

For 4.1 you’ll need:

Note that you need the allowedAttributes to contain the usernameAttribute otherwise the value of the usernameAttribute will be ignored.

<bean class="" p:id="0"
<property name="usernameAttributeProvider">
c:usernameAttribute="uid" />
<property name="attributeReleasePolicy">
<bean class="">
<property name="allowedAttributes">

Now to configure Share and Alfresco (see previous posts)

If you are using CAS 4.0_RC2 then make sure that you are using the CAS 2 protocol (or SAML but I’d go with CAS 2) so if you are using the java client the in the web.xml your CAS Validation Filter will be:

   <filter-name>CAS Validation Filter</filter-name>

(This will work for CAS 1 in later versions)

Adding files to your amp

When you’re writing an Alfresco extension there’s a good chance that you’ll want to do some configuration or add some files along with your code.

One option is to, via a documented process, add everything by hand but it’s neater and more reliable if you can do it as part of your amp.

The trick here is to use acp files.

These files are created via Exporting from the Alfresco client see here – there is a good chance you’ll want to edit the acp files after you’ve created them e.g. to remove system files.

If you want to include the acp file directly in your amp then you should include them as part of the bootstrap process.

This is a two part operation.

Copy the acp file to the amp e.g. in /src/main/resources/alfresco/module/org_wrighting_module_cms/bootstrap

Add the following bean definition to /src/main/resources/alfresco/module/org_wrighting_module_cms/context/bootstrap-context.xml

  <bean id="org_wrighting_module_cms_bootstrapSpaces" class="org.alfresco.repo.module.ImporterModuleComponent" 
        <property name="moduleId" value="org.wrighting.module.cms" />
        <property name="name" value="importScripts" />
        <property name="description" value="additional Data Dictionary scripts" />
        <property name="sinceVersion" value="1.0.0" />
        <property name="appliesFromVersion" value="1.0.0" />

        <property name="importer" ref="spacesBootstrap"/>
        <property name="bootstrapViews">
                     <prop key="path">/${spaces.company_home.childname}/${spaces.dictionary.childname}/app:scripts</prop>
                     <prop key="location">alfresco/module/org_wrighting_module_cms/bootstrap/wrighting_scripts.acp</prop>

This will then import your scripts to the Data Dictionary ready for use.

The acp file itself is a zip file containing an XML file describing the enclosed files – it’s a good idea to use the export action to create this as there is a fair amount of meta information involved.

If you want to expand the acp and then copy it into place then the following in your pom.xml will do the job

                     <zip basedir="${basedir}/tools/export/wrighting/wrighting_scripts.acp"
                          destfile="${}/${project.artifactId}-${project.version}/config/alfresco/module/org_wrighting_module_cms/bootstrap/wrighting_scripts.acp" />

CAS for Alfresco 4.2 on Ubuntu

Lots of confusion around on this subject so I’m going to attempt to distill some wisdom into this post and tweak it for Ubuntu

2 good blogs Nick with mod_auth_cas and Martin with CAS client and the Alfresco docs

I’m not going to talk about setting up CAS here as this post is complex enough already – I’ll just say be careful if using self signed certs.

I’ve used Martin’s method before with Alfresco 3.4

It’s a tricky decision as to which approach to use:

  • the mod_auth_cas approach is the approach supported by Alfresco but it introduces the Apache plug in which isn’t as well supported by CAS and you have problems with managing the mod_auth_cas cookie management, caching etc
  • the java client is a bit more involved and intrusive but seems to work quite well in the end
  • I haven’t tried container managed auth but it looks promising

Using mod_auth_cas

For a more detailed explanation look at Nick’s blog – this entry is more about how rather than why and is specific to using apt-get packages on Ubuntu.

First set up your mod_auth_cas

Next tell Tomcat to trust the Apache authentication by setting the following attribute tomcatAuthentication=”false” on the AJP Connector (port 8009)

Now you need to set up the Apache Tomcat Connectors module – mod-jk

apt-get install libapache2-mod-jk

Edit the properties file defined in /etc/apache2/mods-enables/jk.conf – /etc/libapache2-mod-jk/ – to set the following values


Add to your sites file e.g. /etc/apache2/sites-enabled/000-default

JkMount /alfresco ajp13_worker
JkMount /alfresco/* ajp13_worker
JkMount /share ajp13_worker
JkMount /share/* ajp13_worker

And don’t forget to tell Apache which URLs to check

<Location />
Authtype CAS
require valid-user

A more complex example in the wiki here

Add the following to tomcat/shared/classes/


Finally add the following section to tomcat/shared/classes/alfresco/web-extension/share-config-custom.xml

Note that if you have customizations you may need this in the share-config-custom.xml in your jar

 	<config evaluator="string-compare" condition="Remote">
				<name>Alfresco - unauthenticated access</name>
				<description>Access to Alfresco Repository WebScripts that do not
					require authentication

				<name>Alfresco - user access</name>
				<description>Access to Alfresco Repository WebScripts that require
					user authentication

				<name>Alfresco Feed</name>
				<description>Alfresco Feed - supports basic HTTP authentication via
					the EndPointProxyServlet

				<name>Activiti Admin UI - user access</name>
				<description>Access to Activiti Admin UI, that requires user


This gets you logged in but you still need to logout! Share CAS logout.
One thing to be careful about with using mod_auth_cas here is that you need to be aware of the mod_auth_cas caching – if you are not careful you’ll log out but mod_auth_cas will still think that you are logged in. There are some options here – set the cache timeout to be low (inefficient), use single sign out (experimental)

Using CAS java client

Martin’s blog works for Alfresco 3.4 and here are some notes I made for 4.2.d

Note that it is not supported to make changes to the web.xml

Make the following jars available:

cas-client-core-3.2.1.jar, commons-logging-1.1.1.jar, commons-logging-api-1.1.1.jar

You can do this by including them in the wars or by copying the following jars into <<alfresco home>>/tomcat/lib
N.B. If you place them into the endorsed directory then you will get error messages like this:
SEVERE: Exception starting filter CAS java.lang.NoClassDefFoundError: javax/servlet/Filter

You need to make the same changes to tomcat/shared/classes/ and share-config-custom.xml as for the mod_auth_cas method

Now add the following to share/WEB-INF/web.xml and alfresco/WEB-INF/web.xml

There’s some fine tuning to do on the url-pattern probably the best way is to copy the filter mappings for the existing authentication filter and add /page for share and /faces for alfresco.

Using the values below works but is a little crude (shown here to be concise)

    <filter-name>CAS Authentication Filter</filter-name>
    <filter-name>CAS Validation Filter</filter-name>
    <filter-name>CAS HttpServletRequest Wrapper Filter</filter-name>
    <filter-name>CAS Authentication Filter</filter-name>
    <filter-name>CAS Validation Filter</filter-name>
    <filter-name>CAS HttpServletRequest Wrapper Filter</filter-name>

Next add the following to the session-config section of the web.xml which relates to this issue which may be solved via removing the jsessionid from the url (this may cause problems with the flash uploader if you’re still using it see here)


There’s also a case for using web-fragments to avoid changing the main web.xml

You will need to redirect the change password link in the header (how to depends on version)

Container managed auth

This looks quite interesting CAS Tomcat container auth as it allows the use of the CAS java client within tomcat so being closer to the mod_auth_cas approach but without needing to configure Apache.

This issue referenced above gives some details of how somebody tried it – I think it should work if the session tracking mode is set to COOKIE but haven’t tried it.

More complex configurations

This is beyond what I’m trying to do but if you’ve got a load balanced configuration you may need to think about the session management – the easiest way to approach may be to use sticky sessions e.g.

ProxyRequests Off
ProxyPassReverse /share balancer://app
ProxyPass /share balancer://app stickysession=JSESSIONID|jsessionid nofailover=On

BalancerMember ajp://localhost:8019/share route=tomcat3
BalancerMember ajp://localhost:8024/share route=tomcat4


mod_auth_cas for CAS 3.5.2 on Ubuntu

This is not as straightforward as it should be as mod_auth_cas has not yet been brought up to date with the latest SAML 1.1 schema and the XML parsing doesn’t support the changes. In addition the pull request for the changes in github is out of date with the main branch so that’s not much help either.

That being said if you don’t use the SAML validation for attribute release you can still go ahead.

apt-get install libapache2-mod-auth-cas
a2enmod auth_cas

Configure the CAS configuration which you can do in /etc/apache2/mods-enabled/auth_cas.conf

CASCookiePath /var/cache/apache2/mod_auth_cas/
CASDebug Off
CASValidateServer On
CASVersion 2
#Only if using SAML
#CASValidateSAML Off
#CASAttributeDelimiter ;
#Experimental sign out
CASSSOEnabled On


Configure the protected directories probably somewhere in /etc/apache2/sites-enabled
N.B. You also need to ensure that the ServerName is set otherwise the service parameter on the call to CAS will contain as the hostname

    Authtype CAS
    CASAuthNHeader On
    require valid-user
    #Only works if you are using Attribute release which requires SAML validation
    #require cas-attribute memberOf:cn=helpDesk,ou=groups,dc=wrighting,dc=org
    Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
    Order allow,deny
    Allow from all

    Authtype CAS
    require valid-user


Don’t forget to restart apache service apache2 reload

Updated mod_auth_cas is now being maintained again

mkdir /var/cache/apache2/mod_auth_cas
chown www-data:www-data /var/cache/apache2/mod_auth_cas
apt-get install make apache2-prefork-dev libcurl4-gnutls-dev
git clone
make install


Editing a dojox DataGrid connected to a JsonRest store

A little bit of a problem with using a dojox.grid.DataGrid connected to a data store.

Note that dojox.grid is officially abandoned so it’s not really a very good idea to use it anyway – should be using dgrid or GridX instead.

If you edit a cell value then timing issues mean that although the change is held (and will be saved when you the displayed value reverts to the original.

One way around this is to make use of onApplyCellEdit.

You can also use this function to call if you want to save after every change.

There also appears to be a problem with adding rows not displaying as well.

 function saveCellEdit(inValue, inRowIndex, inAttrName){
	var data = this.getItem(inRowIndex);
	data[inAttrName] = inValue;
  /*create a new grid*/
  var grid = new DataGrid({
			id : 'grid',
			store : store,
			structure : layout,
			rowSelector : '20px',
			onApplyCellEdit: saveCellEdit

Some relevant links:

Using the LDAP Password Modify extended operation with Spring LDAP

If you want to change the password for a given user in an LDAP repository then you need to worry about the format in which it is being stored otherwise you will end up with the password held in plain text (although base64 encoded)

Using the password modify extended operation (rfc3062) allows OpenLDAP, in this case, to manage the hashing of the new password.

If you don’t use the extension then you have to hash the value yourself.
This code stores the new password as plaintext and treats the password as if it is any other attribute.
You can implement hashing yourself e.g. by prepending {MD5} and using the base64 encoded md5 hash of the new password – see this forum entry

Don’t use this!

	DistinguishedName dn = new DistinguishedName(dn_string);
	Attribute passwordAttribute = new BasicAttribute(passwordAttr,
	ModificationItem[] modificationItems = new ModificationItem[1];
	modificationItems[0] = new ModificationItem(
			DirContext.REPLACE_ATTRIBUTE, passwordAttribute);
	Attribute userPasswordChangedAttribute = new BasicAttribute(
			LDAP_PASSWORD_CHANGE_DATE, format.format(convertToUtc(null)
					.getTime()) + "Z");
	ModificationItem newPasswordChanged = new ModificationItem(
			DirContext.REPLACE_ATTRIBUTE, userPasswordChangedAttribute);
	modificationItems[1] = newPasswordChanged;
	getLdapTemplate().modifyAttributes(dn, modificationItems);


This example uses the extended operation which means that password will be stored according to the OpenLDAP settings i.e. SSHA by default.

The ldap template here is an instance of org.springframework.ldap.core.LdapTemplate

 ldapTemplate.executeReadOnly(new ContextExecutor() {
   public Object executeWithContext(DirContext ctx) throws NamingException {
      if (!(ctx instanceof LdapContext)) {
            throw new IllegalArgumentException(
               "Extended operations require LDAPv3 - "
               + "Context must be of type LdapContext");
      LdapContext ldapContext = (LdapContext) ctx;
      ExtendedRequest er = new ModifyPasswordRequest(dn_string, new_password);
      return ldapContext.extendedOperation(er);

This thread gives an idea of what is required however the ModifyPasswordRequest class available from here actually has all the right details implemented.

You will find that other LDAP libraries e.g. ldapChai use the same ModifyPasswordRequest class

CAS, OpenLDAP and groups

This is actually fairly straightforward if you know what you’re doing unfortunately it takes a while, for me at least, to get to that level of understanding.

Probably the most important thing missing from the pages I’ve seen describing this is that you need to configure OpenLDAP first.


What you want is to enable the memberOf overlay

For Ubuntu 12.04 the steps are as follows:
Create the files

dn: cn=module,cn=config
objectClass: olcModuleList
cn: module
olcModulePath: /usr/lib/ldap
olcModuleLoad: memberof


dn: olcOverlay=memberof,olcDatabase={1}hdb,cn=config
objectClass: olcMemberOf
objectClass: olcOverlayConfig
objectClass: olcConfig
objectClass: top
olcOverlay: memberof
olcMemberOfDangling: ignore
olcMemberOfRefInt: TRUE
olcMemberOfGroupOC: groupOfNames
olcMemberOfMemberAD: member
olcMemberOfMemberOfAD: memberOf

Then configure OpenLDAP as follows:

ldapadd -Y EXTERNAL -H ldapi:/// -f module.ldif
ldapadd -Y EXTERNAL -H ldapi:/// -f overlay.ldif

You should probably read up on this a bit more – in particular note that retrospectively adding this won’t achieve what you want without extra steps to reload the groups


The CAS documentation is actually reasonably good once you understand that you are after the memberOf attribute but for example I’ll show some config here


<bean id="attributeRepository"
    <property name="contextSource" ref="contextSource" />
    <property name="baseDN" value="ou=people,dc=wrighting,dc=org" />
    <property name="requireAllQueryAttributes" value="true" />

    <!-- Attribute mapping between principal (key) and LDAP (value) names used 
        to perform the LDAP search. By default, multiple search criteria are ANDed 
        together. Set the queryType property to change to OR. -->
    <property name="queryAttributeMapping">
            <entry key="username" value="uid" />

    <property name="resultAttributeMapping">
            <!-- Mapping beetween LDAP entry attributes (key) and Principal's (value) -->
            <entry value="Name" key="cn" />
            <entry value="Telephone" key="telephoneNumber" />
            <entry value="Fax" key="facsimileTelephoneNumber" />
            <entry value="memberOf" key="memberOf" />


After that you can setup your CAS to use SAML1.1 or modify view/jsp/protocol/2.0/casServiceValidationSuccess.jsp according to your preferences.

Don’t forget to allow the attributes for the registered services as well

<bean id="serviceRegistryDao" class="">
		<property name="registeredServices">
				<bean class="">
					<property name="id" value="0" />
					<property name="name" value="HTTP and IMAP" />
					<property name="description" value="Allows HTTP(S) and IMAP(S) protocols" />
					<property name="serviceId" value="^(https?|imaps?)://.*" />
					<property name="evaluationOrder" value="10000001" />
					<property name="allowedAttributes">


A first R project

To start with I’m using the ProjectTemplate library – this creates a nice project structure

I’m going to be attempting to analyze some census data so I’ll call the project ‘census’

I’m interested in Eynsham but it’s quite hard to work out which files to use – in the end I’ve stumbled across the parish of Eynsham at
this page and the ward of Eynsham from 2001 here which seem roughly comparable.

This blog entry is also quite interesting although I found it rather late in the process


Now we can copy some census data files into the data directory then load it all up.
(I’m not going to cover downloading the data files and creating an index of categories – it’s more painful than it should be but not that hard – I’ve used the category number as part of the file name in the downloaded files)




This doesn’t work so some experimentation is called for…

I don’t think we can munge the data until after it’s loaded so switch off the data_loading in global.dcf

So let’s create a cache of the data in a more generic fashion

With the parish data for 2011 load up the categories

datatypes = read.csv("data/datatypes.parish.2011.csv", sep="t")
datadef = t(datatypes[,1])
colnames(datadef) <- t(datatypes[,2])

parishId = "11123312"

for (d in 1:length(datadef)){
  datasetName <- paste(parishId,datadef[d], sep = ".")
  filename <- paste("data/",datasetName,".csv", sep = "")
  input.raw = read.csv(filename,header=TRUE,sep=",", skip=2)
  input.t <- t(input.raw[2:(nrow(input.raw)-4),])
  colnames(input.t) <- input.t[1,]
  input.clipped <- input.t[5:nrow(input.t),]
  input.num <- apply(input.clipped,c(1,2),as.numeric)
  dsName <- paste("parish2011_",datadef[d], sep = "")
  assign(dsName, input.num)



Then do the same for 2001

Now needless to say the data from 2001 and 2011 is represented in different ways so there’s a bit of data munging required to standardize and merge similar datasets – I’ve given an example here for population by age where in 2001 the value is given for each year whereas 2011 uses age ranges so it’s necessary to merge columns using sum


ages <- rbind(c(sum(ward2001_91[1,2:6]),sum(ward2001_91[1,7:9]),sum(ward2001_91[1,10:11]),sum(ward2001_91[1,12:16]),
ages <- ages[1:2,]
rownames(ages) <- c("Ward 2001", "Parish 2011")
colnames(ages) <- sub("Age ","",colnames(ages))
colnames(ages) <- sub(" to ","-",colnames(ages))
colnames(ages) <- sub(" and Over","+",colnames(ages))
barplot(ages, beside = TRUE, col = c("blue", "red"), legend=c("2001","2011"), main="Population", width=0.03, space=c(0,0.35), xlim=c(0,1), cex.names=0.6)


The result can be seen here

            0-4 5-7 8-9 10-14 15 16-17 18-19 20-24 25-29 30-44 45-59 60-64 65-74 75-84 85-89 90+
Ward 2001   226 160 119   335 51   112    85   202   250  1072  1003   266   427   241    61  29
Parish 2011 250 139  91   258 59   113   106   227   188   848  1005   314   584   342    80  44

Most interesting looks like a big drop in the 25-44 age range – due to house prices or lack of availability of housing as the population ages? – which is reflected in the rise in people of retirement age and numbers of children although, given the shortage of pre-school places the increase in the 0-4 range is also interesting.

Lots more analyis could be done but that’s outside the scope of this blog entry!

NoSQL – mongodb vs eXist

There are similarities between mongo and eXist in that they are both document orientated and work with collections of documents.
The main difference is that eXist uses XML documents and mongo uses JSON documents.

There’s plenty of documentation on both but from my point of view it’s quite interesting to compare them and will help my understanding!

(At this point I’m just starting out with mongo and have done more with eXist – see this project which is java war containing a REST based service that uses eXist as it’s back end)

eXist uses xquery as it’s query language
With eXist as you are running in an application server you can write a .xql which directly returns your data, or even HTML, and it’s even possible to wrap with java etc e.g. if you want to implement Spring Security or you can just use it as a database server (albeit one running in a java app server)
eXist also has a REST interface, XML-RPC, the java XML:DB api etc see here for more details.

Mongo acts more like a tradition database server with a command shell and to act as a web server requires an additional interface e.g. using python – an example of a REST service is here

There’s a nice comparison between mongodb and sql here
In mongo you can show the collections using the command show collections

To list a collection you use db.collection name.find() this being like SELECT * FROM table or collection('collection name') or xmldb:xcollection('collection name') in xquery

So for a collection ‘test’ containing atom entries in eXist you might do something like this (xquery):

 let $all-recs := xmldb:xcollection( 'test_collection' )/atom:entry (: not recursive :)
 let $test-recs := $all-recs[atom:title[. = 'Test']
 let $author := for $rec in $test-recs
                   let $auth = $rec/atom:author
                   return $auth

or in mongo

 db.test_collection.find({"title": "Test"})

so in eXist you are using XPath whereas in mongo you are writing a json document that matches the document you want to retrieve.

What’s the conclusion?

MongoDB is sexy at the moment and there’s a lot more information available and it looks quite easy to use.

eXist (and it’s big brother MarkLogic) is more niche but the XQuery/XPath language is probably more powerful than the direct mongo syntax. XForms, e.g. Orbeon, also provides a pretty simple way to edit the XML once you’ve got your head around it.

Most languages have pretty good support for either JSON or XML processing.

Overall I think it’s a case of horses for courses but there’s a strong argument for sticking with traditional SQL databases all of which can handle XML text fields with XPath queries and some can handle JSON – it’s even quite easy to serialize XML to a database using HyperJAXB3 which uses JAXB and JPA – see this project. These days it’s also really easy to knock up a simple CRUD application using something like grails.

I can certainly see a case for eXist where you are working with relatively unstructured XML documents – journal articles would be a good example here – XForms is a nice tie in too.

MongoDB is a good candidate if you are using JSON data – for example in an application which is heavily Ajax/REST based and would cope well with a lot of sparsely populated data – could be pretty good for prototyping where you want to change your data structures a lot e.g. with angular js and a python service and of course it’s "web scale" for when your app takes over the world!
(A humourous mongodb mysql comparison here which gets a bit out of control half way through)