brush clojure

Sunday, February 12, 2012

Storing XML in a JSON database

Isn't any JSON database also an XML database? JSON and XML look almost equally expressive, so if I could convert between the two formats, preserving all the information, I could put my XMLs in a JSON database. Lately there are a few databases which accept JSON and let you query on JSON properties.

Googling for such a converter didn't bring me to an undisputedly stable convertor, and my fingers itching to do some Clojure programming, the choice was easily made: I had to try this with Clojure. Hers is the result: https://github.com/kolov/x2j.

The implementation is shockingly compact. I am new to Clojure and writing this compact code was by no means fast, and the code is by no means optimal. Still, it covers the tests. I am impressed wit the compactness of the result. I produce a jar with class Converter, exposing the following 3 methods:

public static String x2j(String xml);
public static String j2x(String jsonContainingOneElement);
public static String j2x(String jsonContainingManyElements, String elementName);

Let's try it with MongoDB. Define two XMLs:


 Joe
 
Main Street Atlanta
34234234324 books tv
John
Jarvis Street Atlanta
NC 679898 books film

In Java:

String xml1 = ""
                + "Joe"
                + ... ;
String xml2 = ...

Let's start. Connect to MongoDB:
Mongo m = new Mongo(SERVER, PORT);
DB db = m.getDB(DB);
boolean auth = db.authenticate(USER, PASSWORD);
DBCollection coll = db.getCollection(COLLECTION_NAME);

Write the two XMLs:

String json1 = Converter.x2j(xml1);
System.out.println("XML: " + xml1);
System.out.println("XML -> Json: " + json1);
coll.insert((DBObject) JSON.parse(json1));

String json2 = Converter.x2j(xml2);
System.out.println("XML: " + xml2);
System.out.println("XML -> Json: " + json2);
coll.insert((DBObject) JSON.parse(json2));

The log shows:

XML: Joe
Main StreetAtlanta
34234234324bookstv
XML -> Json: {"person":{"hobbies":{"hobby":["books","tv"]},"id":{"#text":"34234234324","@type":"passport"},"address":{"street":"Main Street","city":"Atlanta"},"name":"Joe"}} XML: John
Jarvis StreetAtlanta
NC 679898booksfilm
XML -> Json: {"person":{"hobbies":{"hobby":["books","film"]},"id":{"#text":"NC 679898","@type":"licence"},"address":{"street":"Jarvis Street","city":"Atlanta"},"name":"John"}}

That looks OK. Let's now query some data:

DBCursor cursorDoc = coll.find(new BasicDBObject("person.hobbies.hobby", "tv"))
while (cursorDoc.hasNext()) {
 DBObject value = cursorDoc.next();
 System.out.println("Read Json: " + JSON.serialize(value));
  . . 
}
Oops, the log:

Read Json: { "_id" : { "$oid" : "4f36d8c103648a32fb075bc0"} , "person" : { "hobbies" : { "hobby" : [ "books" , "tv"]} , "id" : { "#text" : "34234234324" , "@type" : "passport"} , "address" : { "street" : "Main Street" , "city" : "Atlanta"} , "name" : "Joe"}}

The returned Json has to elements: the person entity and some ID data I don't care about at the moment. That's the reason for the Convertor method:
public static String j2x(String jsonContainingManyElements, String elementName);

Let's filter out _id and get the non-id-element only:

private String convertDbToXml(DBObject value) {
  String json = JSON.serialize(value);
  for (String key : value.keySet()) {
  if (!key.equals("_id")) {
   return Converter.j2x(json, key);
  }
 }
 return null;
}

With this, I run two searches:
findMatches(coll, "person.hobbies.hobby", "tv");
findMatches(coll, "person.hobbies.hobby", "books");

Here's the result:

Searching: person.hobbies.hobby=tv
Read Json: { "_id" : { "$oid" : "4f36d8c103648a32fb075bc0"} , "person" : { "hobbies" : { "hobby" : [ "books" , "tv"]} , "id" : { "#text" : "34234234324" , "@type" : "passport"} , "address" : { "street" : "Main Street" , "city" : "Atlanta"} , "name" : "Joe"}}
Json -> XML: bookstv
Main StreetAtlanta
Joe
Searching: person.hobbies.hobby=books Read Json: { "_id" : { "$oid" : "4f36d8c103648a32fb075bc0"} , "person" : { "hobbies" : { "hobby" : [ "books" , "tv"]} , "id" : { "#text" : "34234234324" , "@type" : "passport"} , "address" : { "street" : "Main Street" , "city" : "Atlanta"} , "name" : "Joe"}} Json -> XML: bookstv
Main StreetAtlanta
Joe
Read Json: { "_id" : { "$oid" : "4f36d8c103648a32fb075bc1"} , "person" : { "hobbies" : { "hobby" : [ "books" , "film"]} , "id" : { "#text" : "NC 679898" , "@type" : "licence"} , "address" : { "street" : "Jarvis Street" , "city" : "Atlanta"} , "name" : "John"}} Json -> XML: booksfilm
Jarvis StreetAtlanta
John

That looks fine, I get back the XML I wrote earlier and the search on XML content worked.

How good is this approach? I don't know, an XML database should be a better choice for extensive XML storage. The conversion leaves several issues open:
- absolutely no namespace support
- Querying XML data in Json terms and not in the 'native' XPath.
Still, it is an easy way to add basic XML functionality to a JSON datastore.