NoSQL – mongodb vs eXist

There are similarities between mongo and eXist in that they are both document orientated and work with collections of documents.
The main difference is that eXist uses XML documents and mongo uses JSON documents.

There’s plenty of documentation on both but from my point of view it’s quite interesting to compare them and will help my understanding!

(At this point I’m just starting out with mongo and have done more with eXist – see this project which is java war containing a REST based service that uses eXist as it’s back end)

eXist uses xquery as it’s query language
With eXist as you are running in an application server you can write a .xql which directly returns your data, or even HTML, and it’s even possible to wrap with java etc e.g. if you want to implement Spring Security or you can just use it as a database server (albeit one running in a java app server)
eXist also has a REST interface, XML-RPC, the java XML:DB api etc see here for more details.

Mongo acts more like a tradition database server with a command shell and to act as a web server requires an additional interface e.g. using python – an example of a REST service is here

There’s a nice comparison between mongodb and sql here
In mongo you can show the collections using the command show collections

To list a collection you use db.collection name.find() this being like SELECT * FROM table or collection('collection name') or xmldb:xcollection('collection name') in xquery

So for a collection ‘test’ containing atom entries in eXist you might do something like this (xquery):

 let $all-recs := xmldb:xcollection( 'test_collection' )/atom:entry (: not recursive :)
 let $test-recs := $all-recs[atom:title[. = 'Test']
 let $author := for $rec in $test-recs
                   let $auth = $rec/atom:author
                   return $auth

or in mongo

 db.test_collection.find({"title": "Test"})

so in eXist you are using XPath whereas in mongo you are writing a json document that matches the document you want to retrieve.

What’s the conclusion?

MongoDB is sexy at the moment and there’s a lot more information available and it looks quite easy to use.

eXist (and it’s big brother MarkLogic) is more niche but the XQuery/XPath language is probably more powerful than the direct mongo syntax. XForms, e.g. Orbeon, also provides a pretty simple way to edit the XML once you’ve got your head around it.

Most languages have pretty good support for either JSON or XML processing.

Overall I think it’s a case of horses for courses but there’s a strong argument for sticking with traditional SQL databases all of which can handle XML text fields with XPath queries and some can handle JSON – it’s even quite easy to serialize XML to a database using HyperJAXB3 which uses JAXB and JPA – see this project. These days it’s also really easy to knock up a simple CRUD application using something like grails.

I can certainly see a case for eXist where you are working with relatively unstructured XML documents – journal articles would be a good example here – XForms is a nice tie in too.

MongoDB is a good candidate if you are using JSON data – for example in an application which is heavily Ajax/REST based and would cope well with a lot of sparsely populated data – could be pretty good for prototyping where you want to change your data structures a lot e.g. with angular js and a python service and of course it’s "web scale" for when your app takes over the world!
(A humourous mongodb mysql comparison here which gets a bit out of control half way through)

Rookie python

When running in a virtualenv don’t forget to source bin/activate

Install packages using pip e.g. pip install beautifulsoup4

Create a REQUIREMENTS file using pip freeze and load using pip install -r REQUIREMENTS

Creating a module you need in the directory in order to import successfully otherwise you’ll get AttributeError: 'module' object has no attribute 'xxxx'

grails – the very basics

A lightening fast summary of how to create a CRUD application using grails.
In many ways this is a similar to using Spring Roo (see this post) but with Grails/Groovy instead of Spring Roo.
There are a lot of similarities between the two approaches and, of course, if you want to you can mix java and groovy…

This is a very brief precis of this developer works series

Set up some classes with simple list, edit, delete

From the grails command prompt (Ctrl-Alt-Shift G in STS)

create-domain-class Trip


Add some fields (no need for ; )

String name
	String city
	Date startDate
	Date endDate
	String purpose
	String notes



generate-all package.Trip


Change content of TripController.groovy to


def scaffold = Trip

Remove views (if you don’t want to customize them later) – you can always recreate them with

generate-views Airport

Do the same for Airline

Define many to one relationships

static hasMany = [trip:Trip]

If you want cascading deletes the add a belongsTo (otherwise just declare)

static belongsTo = [Airline]

Set some constraints

	static mapping = {
		table 'some_other_table_name'
		columns {
		  name column:'airline_name'
		  url column:'link'
		  frequentFlyer column:'ff_id'

	static constraints = {
		name(blank:false, maxSize:100)
	static hasMany = [trip:Trip]
	String name
	String url
	String frequentFlyer
	String notes

	String toString(){
		return name

DB config

install-dependency mysql:mysql-connector-java:5.1.20

In grails-app/conf/BuildConfig.groovy uncomment the mysql dependency

In grails-app/conf/DataSource.groovy

 driverClassName = "com.mysql.jdbc.Driver"
  username = "grails"
  password = "server"
  url = "jdbc:mysql://localhost:3306/trip?autoreconnect=true"

Create a custom taglib

create-tag-lib Date
class DateTagLib {
  def thisYear = {
    out << Calendar.getInstance().get(Calendar.YEAR)
<div id="copyright">
&copy; 2002 - <g:thisYear />, FakeCo Inc. All Rights Reserved.
class DateTagLibTests extends GroovyTestCase {
  def dateTagLib

  void setUp(){
    dateTagLib = new DateTagLib()

  void testThisYear() {
    String expected = Calendar.getInstance().get(Calendar.YEAR)
    assertEquals("the years don't match", expected, dateTagLib.thisYear())


Similar to .jspf – use an _ e.g. _footer.gsp

<g:render template="/footer" />


To use your own templates instead of the defaults e.g. for default scaffold


Some maven thoughts

Some random observations about using maven

Checking dependencies

mvn versions:display-dependency-updates
mvn versions:display-plugin-updates


With github

Use at least mvn 3.0.4 and release plugin 2.3.2 otherwise it hangs/takes a long time to return apparently after doing git push (it may still work)

When you want to release but don’t want to deploy to a repo

One shot

mvn -o release:prepare -DpreparationGoals='clean install -Dmaven.test.skip=true' -Dresume=false -DautoVersionSubmodules=true

Two shot

mvn release:prepare as normal

mvn release:perform -Dgoals=install

Working with a repository

1) In .m2/settings.xml

 <settings xmlns="" xmlns:xsi=""

<!-- for deployment -->





Using an Association to a data (contact) list

The aim of this post is to describe how to use associations in Alfresco to link to a data list.

The use case I’m addressing here is to be able to associate a folder with people held in a contact list.

The first thing to do is to create the association in the model.

Here I’m creating an aspect that allows for one primary contact and n other contacts.

You’ll notice that there are properties as well as associations – more about that later.

        <aspect name="ael:contacts">
                <property name="ael:primaryContact">
                    <title>Primary Contact</title>
                <property name="ael:otherContact">
                    <title>Other Contact</title>

                <association name="ael:mainContact">
                <association name="ael:otherContacts">

You’ll need to create a data list to link to so let’s create a contact list called External Contacts.

Now it’s time to set up the edit form in the share-config-custom.xml so add the following to the appearance section of the relevant form

<field id="ael:mainContact" label-id="ael.metadata.mainContact"
  <control template="/org/alfresco/components/form/controls/association.ftl">
     <control-param name="startLocation">//cm:External_Contacts</control-param>
<field id="ael:otherContacts" label-id="ael.metadata.otherContact"
     <control-param name="startLocation">//cm:External_Contacts</control-param>

Now this is where it starts to get interesting….

You’ll notice that the start location is an xpath string to //cm:External_Contacts whereas our contact list is called ‘External Contacts’. If you run this then the xpath won’t be found so you’ll taken to the root – so navigate down to the site and into data lists and you’ll see that all the nodes are represented by their uid – not very helpful so let’s go in and change the name of the contact list using a script (the excellent javascript console is useful here) = 'External_Contacts';;

Note that it’s not

(You won’t see the underscore and it makes the xpath easier)

So now if you run it you’ll be taken to the right place but any contacts will show up as uids again – so another script this time tied into the create and update actions on the folder.

 var first =['dl:contactFirstName'];
var last =['dl:contactLastName'];
var displayName = "";
if (last != null && last.length > 0) {
  displayName = last;
} else if (first != null && first.length > 0) {
  displayName = first;
if (first != null && first.length > 0 && last != null && last.length > 0) {
 displayName = last + ", " + first;
} = displayName;;

Now we’re at the point where we can edit the properties and usefully create an association with a member of the contact list.

Next adding a custom advanced search – simple just use the association control in the search form and we’re away – NO – it looks like it works, you get the selection dialogs coming up but no results are returned. Unfortunately the Alfresco search doesn’t work across associations so now it’s time to get a bit creative and use those extra properties we set up earlier.

So on your folder with the aspect applied it’s time to add another action using the script below

for each (var assoc in document.assocs["ael:mainContact"]) {["ael:primaryContact"] =["cm:name"];

var contacts = new Array();
for each (var assoc in document.assocs["ael:otherContacts"]) {
}["ael:otherContact"] = contacts;;

Once you’ve done this then you can use the otherContact and primaryContact fields in your search form instead of the mainContact and otherContacts fields. This means that you’ll have text searches and won’t get the option to select your contacts but at least it works and does allow the use of wildcard searching.

Another advantage of having these fields available is that you can use them in DocumentList renderer.

Prior to having these fields I created a custom renderer to follow the associations and get the properties to display. This worked but was a bit overcomplex (it would have helped if the propertyName was available to the renderer function so that the same function could be used for different properties – I used a hack with the label to achieve the same effect)

I discovered a draw back to this approach – the cm:name has to be a valid filename so something like Bloggs, Joe M.I. isn’t valid


CMIS using perl

I’ve had a requirement to get some document details out of Alfresco using perl.

CMIS seems to be the obvious answer and a little bit of googling gives us the WebService::Cmis package.

This seems to do the job nicely but the documentation isn’t great (hint: use the pod) so here is a trival example to recurse through a folder and print out the details of the contents.

#!/usr/bin/perl -w
use Data::Dumper;
use WebService::Cmis;
use Cache::FileCache;

print "Content-type:text/htmlrnrn";

my $client = WebService::Cmis::getClient(
url => "https://myhost/alfresco/cmisatom",
user => '',
password => "",
cache => new Cache::FileCache({
cache_root => "/tmp/cmis_client"

my $repo = $client->getRepository;
# print Dumper($repo);

my $projectFolder = "/Sites/mySite/documentLibrary/myFolder";

my $folder = $repo->getObjectByPath($projectFolder);
# print Dumper($folder);



sub showFolderDetails {
 my $fold = shift;
 print "<h2>".$fold->getTitle()."</h2>n";
# my $props = $fold->getProperties;
#print Dumper($props);

sub showDocumentDetails {
 my $doc = shift;
# print Dumper($doc);
 my $props = $doc->getProperties;
 if ($props->{'cmis:isLatestVersion'}->getValue eq 1) {
  #Show mail messages
  if (defined $props->{'imap:messageFrom'}) {
   print $props->{'imap:messageFrom'}->getValue;
   print $props->{'imap:messageTo'}->getValue;
   print $props->{'imap:messageSubject'}->getValue;
   print $props->{'imap:flagAnswered'}->getValue;
  } else {
   print $doc->getTitle()."n";
# print Dumper($props);

sub showFolderContents {
 my $fold = shift;

 my $projects = $fold->getChildren();
 while (($entry = $projects->getNext())){

   if ($entry->isa("WebService::Cmis::Folder")) {
   } else {


How to tell if my DHCP ipaddress has changed

A line of script to send you an email if your DHCP allocated IP address has changed.

Either run it from a cron or put it in /etc/rc.local

ls -rt /var/lib/dhcp/dhclient*eth0* | xargs grep fixed-address | tail -2 | awk ‘{print $3}’ | xargs echo -n | sed -e ‘s/;//g’ | awk ‘{if ($1 != $2) { print $2}}’ | mail -E’set nonullbody’ -s “My new IP”

Spring Roo ConverterNotFound

In webmvc-config.xml look for

<bean class="org.wwarn.cb.web.ApplicationConversionServiceFactoryBean" id="applicationConversionService"/>

This will give you the name of the class to edit so go and find that class and add the following code:

Note that the installFormatters method is deprecated in 3.1 but as that is what Roo is currently generating I’m leaving it alone

public Converter getStudyConverter() {
        return new Converter() {
            public String convert(ChassisStudies source) {
                return source.getStudyId();
	protected void installFormatters(FormatterRegistry registry) {
		// Register application converters and formatters

MySQL performance problems

I wrote a stored procedure on MySQL that is using a cursor to update a table of about 45,000 rows.

The details what it is doing is are unimportant but the logic is very simple.

The procedure was running extremely slowly (of the order of 1/10 th second per operation) – the steps below describe what I did to make sure that the procedure ran reasonably quickly.

Taking a look in mysql.log I saw the message:

120329 10:34:48 InnoDB: ERROR: the age of the last checkpoint is 9433926,
InnoDB: which exceeds the log group capacity 9433498.
InnoDB: If you are using big BLOB or TEXT rows, you must set the
InnoDB: combined size of log files at least 10 times bigger than the
InnoDB: largest such row.

This leads me to changing my.ini to add (or change) the value of innodb_log_file_size
(Should really read the manual….)

innodb_log_file_size = 64M

You then need to shutdown the server and (re)move the existing log files(ib_logfile0 and ib_logfile1) before starting again with the new values
Official docs
Another value that looks to be recommended to change is innodb_buffer_pool_size and while I was there I changed some other values:

innodb_open_files = 512
innodb_buffer_pool_size = 512M
#innodb_buffer_pool_size = 4G
innodb_additional_mem_pool_size = 512M
innodb_log_file_size = 64M
innodb_log_buffer_size = 8M
innodb_thread_concurrency = 8
innodb_concurrency_tickets = 500
#Not windows
#innodb_flush_method = O_DIRECT
innodb_autoinc_lock_mode= 2
#innodb_io_capacity = 10000
#innodb_adaptive_checkpoint = 1
#innodb_write_io_threads = 8
#innodb_read_io_threads = 8


Other tips

innotop is useful to see what is happening

If there are (slow) temporary tables being created (EXPLAIN Using where; Using temporary; Using filesort ) then try:

mount a tmpfs system on an empty directory (you should also add this to fstab):
mount tmpfs /tmpfs -t tmpfs
and edit my.cnf to make MySQL use that directory as a temporary directory:
tmpdir = /tmpfs