NoSQL – mongodb vs eXist

There are similarities between mongo and eXist in that they are both document orientated and work with collections of documents.
The main difference is that eXist uses XML documents and mongo uses JSON documents.

There’s plenty of documentation on both but from my point of view it’s quite interesting to compare them and will help my understanding!

(At this point I’m just starting out with mongo and have done more with eXist – see this project which is java war containing a REST based service that uses eXist as it’s back end)

eXist uses xquery as it’s query language
With eXist as you are running in an application server you can write a .xql which directly returns your data, or even HTML, and it’s even possible to wrap with java etc e.g. if you want to implement Spring Security or you can just use it as a database server (albeit one running in a java app server)
eXist also has a REST interface, XML-RPC, the java XML:DB api etc see here for more details.

Mongo acts more like a tradition database server with a command shell and to act as a web server requires an additional interface e.g. using python – an example of a REST service is here

There’s a nice comparison between mongodb and sql here
In mongo you can show the collections using the command show collections

To list a collection you use db.collection name.find() this being like SELECT * FROM table or collection('collection name') or xmldb:xcollection('collection name') in xquery

So for a collection ‘test’ containing atom entries in eXist you might do something like this (xquery):

 let $all-recs := xmldb:xcollection( 'test_collection' )/atom:entry (: not recursive :)
 let $test-recs := $all-recs[atom:title[. = 'Test']
 let $author := for $rec in $test-recs
                   let $auth = $rec/atom:author
                   return $auth

or in mongo

 db.test_collection.find({"title": "Test"})

so in eXist you are using XPath whereas in mongo you are writing a json document that matches the document you want to retrieve.

What’s the conclusion?

MongoDB is sexy at the moment and there’s a lot more information available and it looks quite easy to use.

eXist (and it’s big brother MarkLogic) is more niche but the XQuery/XPath language is probably more powerful than the direct mongo syntax. XForms, e.g. Orbeon, also provides a pretty simple way to edit the XML once you’ve got your head around it.

Most languages have pretty good support for either JSON or XML processing.

Overall I think it’s a case of horses for courses but there’s a strong argument for sticking with traditional SQL databases all of which can handle XML text fields with XPath queries and some can handle JSON – it’s even quite easy to serialize XML to a database using HyperJAXB3 which uses JAXB and JPA – see this project. These days it’s also really easy to knock up a simple CRUD application using something like grails.

I can certainly see a case for eXist where you are working with relatively unstructured XML documents – journal articles would be a good example here – XForms is a nice tie in too.

MongoDB is a good candidate if you are using JSON data – for example in an application which is heavily Ajax/REST based and would cope well with a lot of sparsely populated data – could be pretty good for prototyping where you want to change your data structures a lot e.g. with angular js and a python service and of course it’s "web scale" for when your app takes over the world!
(A humourous mongodb mysql comparison here which gets a bit out of control half way through)

XForms select using an XML Schema (XSD)

As I’m using schema based validation for my XForms (using Orbeon) I thought it would be a nice idea to generate the select options from the xsd (as well as saving some work!).

As an advanced feature we can also select the value for a second select based on the value of the first.

This is a collection of ideas from different places on the Orbeon wiki and mailing list together with my own thoughts to bring it together.

First the set up:

Load the schema into an instance in this case it’s called study-info-resources – note that this needs to be in a directory where it is not protected by any security – not because of this code but so it can be used for validation.

Next we set up a resources file – this is going to be used for the labels – as well as nice labels we also get multi language support. This resources file will be used to define the relationship between the first and second selects.

 <xforms:instance id="study-info-resources"
        src="../../../../../../insecure/schema/study.xsd"/>
    <!--
    In your XForms model, load resources.xml in an instance.
    Make sure to make that instance read-only and cacheable: this
    way the instance will only be stored once in memory
   (it will be shared by all the users) and using a more efficient
   (because read-only) representation in memory.

    xxforms:readonly="true" xxforms:cache="true"
    -->
    <xforms:instance id="all-resources"
        src="/apps/common/resources.xml"/>
    <!--
    NOTE: Here we point to a local file with the oxf: protocol.
    This is usually yields more performance than using http:,
    because oxf: will reach a local file on disk.
    However, in the online version, we use the http:,
    because we want to load the resource from an online server!

    Define an instance used to store the current language,
       e.g. en or fr.
    -->
 <xforms:instance id="language"><language>en</language></xforms:instance>
 <xforms:instance id="iterator"><key></key></xforms:instance>
    <!--
    Define a variable ($resources), which points to <resources>
       for the current language.
    You will use this variable as a shortcut in your view
       to point to specific resources.
        -->
<xxforms:variable name="resources"
select="instance('all-resources')/resource[@xml:lang = instance('language')]"/>

Next write the schema elements

 <xs:simpleType name="tradeName">
     <xs:restriction base="xs:string">
       <xs:enumeration value="" />
       <xs:enumeration value="fred" />
       <xs:enumeration value="bert" />
       <xs:enumeration value="Other" />
     </xs:restriction>
 </xs:simpleType>
 <xs:simpleType name="manufacturer">
   <xs:restriction base="xs:string">
     <xs:enumeration value="" />
     <xs:enumeration value="acme" />
     <xs:enumeration value="ernie" />
     <xs:enumeration value="Other" />
  </xs:restriction>
 </xs:simpleType>

Now we set up the resources to get our labels

 <?xml version="1.0" encoding="UTF-8" ?>
<resources>
    <resource xml:lang="en">
        <tradeName>Trade Name</tradeName>
        <manufacturer>Manufacturer</manufacturer>
        <tradeNames>
                    <tradeName ref="" />
                    <tradeName ref="fred">Frederick</tradeName>
                    <tradeName ref="bert">Albert</tradeName>
                    <tradeName ref="Other">Unlisted</tradeName>
        </tradeNames>
        <manufacturers>            
            <manufacturer ref="" tradeName=""></manufacturer>
            <manufacturer ref="fred" tradeName="fred">acme</manufacturer>
            <manufacturer ref="bert" tradeName="bert">ernie</manufacturer>
            <manufacturer ref="Other" tradeName="Other">Other</manufacturer>
        </manufacturers>
    </resource>
</resources>

Now to write our control – as you can see the itemset is populated by looking at the enumeration values from the schema. Looking up the labels from the resources is slightly trickier (of course you can just use the value from the enumeration as the label and not do this look up)

The xforms-value-changed action is only used for the next part so you can usually leave it out.

<xforms:select1 ref="tradeName" appearance="minimal">
    <xforms:label class="fixed-width" model="mod-study-info"
                                        ref="$resources/tradeName"/>

    <xforms:itemset model="mod-study-info"
        nodeset="instance('study-info-resources')//xs:simpleType[@name='tradeName']/xs:restriction/xs:enumeration">
         <xforms:label ref="for $currentItemName in @value
             return $resources//tradeName[@ref = $currentItemName]"/>
        <xforms:value ref="@value"/>
    </xforms:itemset>
    <xforms:action ev:event="xforms-value-changed">

      <xforms:insert ref="instance('binding-control')//rebuild"
                                               value="something"/>
      <!--  the rebuild is what should make it work however
                           it's actually the previous statement....
      <xforms:rebuild model="mod-study-dashboard" />
      -->
 </xforms:action>
</xforms:select1>

For the final part of this example we are going to create a second select control which is populated based on the value of the first.

Here you can see that if we know the trade name (it’s not ‘Other’) then we can populate the manufacturer based on the mappings held in the resources file. If we don’t know then trade name then this functions just like a normal select box – the only thing necessary to set this up is to create a binding.

The trick to making this work as expected (in Orbeon 3.8) is to use the action specified above – it seems that the insert triggers a model rebuild (if the xforms:rebuild is called then it appears to calculate the binding based on the value of the select prior to the change rather than after the change)

 <xforms:bind nodeset="manufacturer[not(../tradeName = 'Other')]"
        calculate="
            for $currentItemName in ../tradeName
             return
               xxforms:instance('all-resources')//manufacturer[@tradeName = $currentItemName]/@ref
         " />

 

<xforms:select1 ref="manufacturer" appearance="minimal">  
   <xforms:label class="fixed-width" model="mod-study-info"
                                   ref="$resources/manufacturer"/>
   <xforms:itemset model="mod-study-info"
       nodeset="instance('study-info-resources')//xs:simpleType[@name='manufacturer']/xs:restriction/xs:enumeration">
      <xforms:label ref="for $currentItemName in @value
            return $resources//manufacturer[@ref = $currentItemName]"/>
      <xforms:value ref="@value"/>
   </xforms:itemset>
</xforms:select1>

The binding control instance is just a bit bucket used to trigger the rebuild

  <xforms:instance id="binding-control">
        <bc xmlns="">
            <rebuild/>
        </bc>
    </xforms:instance>

XML and relational databases

A problem I seem to keep to getting asked for help with is how to persist XML data to a relational database. (I’ve now been asked to help on at least four different projects where this has become something of an issue/blocker although strangely I’ve never actually needed to do this myself)

The context of this is usually services – I’ve seen this in REST and SOAP based services – not a debate I’m going to enter into here.
Note that this post is concerned with traditional relational databases and does not cover using the XRX (XForms, REST, Xquery) model or XML databases such as MarkLogic or eXist.
The first thing I recommend doing is to ensure that you have an XML schema. I think that it’s good practice and makes you think about the design of your XML structures. If you’re more familiar with designing databases or programmatic data structures then this will help you think about your XML in the same way and save you problems later on. It’s all too easy to end up with the same element name used for different things or common data structures with slightly different names if you just use well-formed XML.

What’s the best way to do this? Well as usual it depends…

Java driven approach

Here the starting point is your java beans.
You can use annotations for both JPA, to define the object relational mapping, and JAXB to define the object XML mapping.
http://www.oracle.com/technetwork/articles/marx-jse6-090753.html

XML driven approach

Here the XML is the primary component. Information about the structure of the data is maintained in the schema together with the JPA annotations.
Using the schema annotations it is possible to customize the way that the java beans are generated.
This is a nice approach because all the information about the structure of the data is held in the same place and everything can be easily regenerated and you don’t have to worry about losing any changes made to the generated classes.
You will, of course, need to ensure that your database schema stays in sync with the JPA definitions.
Option #1 – Hyperjaxb3

https://github.com/highsource/hyperjaxb3

Option #2 – Use Dali to map your POJOs to Database (JPA)

The Eclipse Dali tool provides tooling to easily map your POJOs to a relational database using JPA:

http://www.eclipse.org/webtools/dali/

Option #3 – Use EclipseLink

EclipseLink provides both JPA and JAXB implementations. The JAXB implementation (MOXy) contains extensions specifically for handling JPA entities:

http://wiki.eclipse.org/EclipseLink/Examples/MOXy/JPA
Other links
http://www.slideshare.net/shaunmsmith/restful-services-with-jaxb-and-jpa
http://blog.hma-info.de/2008/05/15/hyperjaxb-3-the-fastest-way-from-xml-to-db/

http://blogs.bytecode.com.au/glen/2010/07/29/from-wsdl-to-jaxb-to-jpa-with-a-single-schema–adventures-in-hyperjaxb3.html

XML driven approach – a variation

An alternative approach to annotating the XML schema is to use aspect orientated programming and place the JPA annotations in the AspectJ files (a bit like how Spring Roo works)

Database driven approach

In this approach the database is the most important component in the system – this is a good approach if you’ve got an existing, stable database and want to be able to quickly add XML capabilities e.g. to provide a service based interface

Create JPA beans

There are a number of ways to do this – you can of course write the beans by hand but it’s easier to generate them. You can do this using Spring Roo (see earlier blog post) or Eclipse Dali

Mapping JPA to XML

The obvious way to map between XML and Java beans is to use JAXB.
One approach is to generate the JAXB beans and write a custom mapping to the JPA beans however there is a better way.
If you use the MOXy JAXB implementation then there are some extensions which you can use :
• XPath Based Mapping
• JPA Entities to XML – Bidirectional Relationships
These extensions allow you to use annotations on your JPA beans to describe your JAXB mappings.
N.B. If you regenerate your JPA beans then these annotations will be lost – a major disadvantage to this approach

A detailed example of how to do this is at:
http://bdoughan.blogspot.com/2010/08/creating-restful-web-service-part-15.html

The common approach

I’ve called in this because, in my experience, this is what most java coders do. In fact this covers several different approaches but at heart they are very similar.

Parse the XML into java beans

Lots of ways to do this – JAXB, using DOM or SAX by hand, if you’ve a well written WSDL file for a SOAP service then this will be part of the generated code.

Persist the java beans

Again lots of ways to do this – hand written JDBC calls, Hibernate…

The simple approach

This is a simple approach which might be sufficient if performance is not too much of an issue.
This could be a good approach if you’ve got some relatively complex XML, which you want to store, but are not interested in much of the actual content of the XML.
It is easy enough to store XML documents as BLOB and all the signification databases now support XPATH querying of the data (Check your database for specifics of how to do this e.g. MySQL at http://dev.mysql.com/tech-resources/articles/xml-in-mysql5.1-6.0.html#xml-5.1-xpath)
If you are using this approach then it’s probably a good idea to use stored procedures so that if you need to migrate to a more structured data model then this can be accomplished with minor application changes.
This approach can be enhanced by extracting specific elements into structured columns so that if there is a particular element that you want to query you can do it in the traditional manner.

ETL Tools

If you’ve got the tools and the experience in using them then ETL tools can provide a graphically driven interface to produce the mapping between the XML and the database schema. Talend, Oracle Data Integrator, IBM Data Stage, Informatica etc all provide this and can expose service endpoints for interactions. If you’ve got the tools and expertise in using them then it’s worth considering this approach.

I think this post is probably long enough for now…