brush clojure

Monday, June 25, 2012

Asynchronous long-running requests with Servlet 3.0 and Grizzly

Maintaing a lot of open connections on a sever can be expensive, certainly in the traditional thread-per-connection model. When the server is waiting for some event (disk IO, network etc.) before sending a response, a lot of resources are reserved for doing very little.In the traditional thread-per-request model, a thread mainly holds the state of the connection. I am building a service which will take a long to respond to each request because it waits on external process each time, so before diving into a technology I took the time to measure the performance of three possible approaches: traditional 2.5 servlet, 3.0 Asynchronous servlet and NIO (Asynchronous IO) with Grizzly.

The Setup

A request in the form http://{host}/hi?wait=100 causes the server to wait for the given amount of milliseconds before returning a response. I did some measurements on three solutions to see how they actually respond to a big concurrent load.

2.x servlet

This is pretty straightforward:
 
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { 
  long waitTime = parseWaitTime(request);
  slowOperation(waitTime);

  response.setContentType("text/plain");
  PrintWriter out = response.getWriter();
  out.println("waited " + waitTime + "ms");
}
The slowOperation could consist of simple sleep, but as for the asynchronous implementations use a scheduled executor, I provided a similar implementation here to make the comparison fairer.
 
private final ScheduledExecutorService ses = Executors.newScheduledThreadPool(16);
private void slowOperation(long time) {
  final Semaphore m = new Semaphore(0);

  ses.schedule(new Runnable() {
      @Override
      public void run() { m.release(); }
    }, time, TimeUnit.MILLISECONDS);

  try {
    m.acquire();
  } catch (InterruptedException e) {
    System.out.println("never happens");
  }
}

3.0 servlet

The 3.0 specification allow to exit doGet without writing a response. The saved AsyncContext can be used later to do that.
 
@WebServlet(urlPatterns = {"/hi"}, asyncSupported = true)
public class SlowAsyncServlet extends HttpServlet {
    private final ScheduledExecutorService ses = Executors.newScheduledThreadPool(16);

    public void doGet(HttpServletRequest request, HttpServletResponse response) {
        final AsyncContext ac = request.startAsync();
        final long finalTime = parseWaitTime(request);
        ses.schedule(new Runnable() {
            @Override
            public void run() {
                try {
                    ac.getResponse().getWriter().write("waited " + finalTime + "ms");
                } catch (IOException e) {
                    System.out.println("Error");
                }
                ac.complete();
            }
        }, finalTime, TimeUnit.MILLISECONDS);
    }
}

Grizzly

Grizzly is a server framework which allows to take advantage of the Java NIO without handling all details. Grizzly is not limited to a HTTP server, it is supported next to other implementations. I took the example at http://grizzly.java.net/nonav/docs/docbkx2.0/html/httpserverframework-samples.html as starting point and ended with the following HTTPHandler:
public class NonBlockingEchoHandler extends HttpHandler { 

    private final ScheduledExecutorService ses = Executors.newScheduledThreadPool(16);

    private static int count = 0;

    @Override
    public void service(final Request request,
                        final Response response) throws Exception {

        final char[] buf = new char[128];
        final NIOReader in = request.getReader(false); // put the stream in non-blocking mode
        final NIOWriter out = response.getWriter();
        final long waitTime = parseWaitTime(request);
        response.suspend();
 
        in.notifyAvailable(new ReadHandler() {

            @Override
            public void onDataAvailable() throws Exception {
                in.notifyAvailable(this);
            }

            @Override
            public void onError(Throwable t) {
                System.out.println("[error]" + t);
                response.resume();
            }

            @Override
            public void onAllDataRead() throws Exception {
                 ses.schedule(new Runnable() {
                    @Override
                    public void run() {
                        try {
                            out.write("waited " + waitTime + "ms");
                        } catch (IOException e) {
                            System.out.println("Error");
                            return;
                        } finally {
                            try {
                                in.close();
                            } catch (IOException ignored) {
                            }
                            try {
                                out.close();
                            } catch (IOException ignored) {
                            }
                            response.resume();
                        }
                    }
                }, waitTime, TimeUnit.MILLISECONDS);
            }
        });
      }

The Measurement

Nothing too exciting, just running ab with different number of concurrent request like this:
ab -r -g servlet2.5-c100 -n 6000 -c 100 http://127.0.0.1:8080/servlet2.5/hi?wait=100
Response times are logged in a file used by gnuplot to generate the diagrams below. I deployed the servlets under Jetty 8.1.3, and started the grizzly server on a Macbook Pro with i5 and 8GB RAM. The only VM parameter was -Xss256k, to keep thread memory usage low. While running the measurements, I also observed the JVM threads an memory usage. The thread usage result are close to the expectations, by 200 concurrent request Jetty was using 54 threads initially, which went up to 261 for servlet 2.5 and to 100 for servlet 3.0. Grizzly started with 39 threads which went up to 55.

The Result

The result? The measurements were not exactly matching my expectations. I expected a clear difference in the performance which only started to appear by very high loads, while performance degradation was noticeable in all three setups. The diagrams below show how response time is distributed among 6,000 requests, fired in concurrent batches with different sizes. Ideally, all request would be processed in 100ms and the result would be three horizontal lines at 100ms. With 10 concurrent requests, as expected, all 3 servers behave just fine. The few request finished faster than 100ms indicate some flaw in the ab result, I am not sure what causes it.

With 100 concurrent request the three implementations still go close together:

By 200 concurrent requests the three servers start showing a different behavior...

more and more so with 500 and 1000 concurrent request, see below.

By high loads both servlets are responding well in most cases, with some percentage of the requests taking really long to process. Grizzly's response time deviation stays low, with very few requests taking really long.

Apart from the threads usage, the results don't show the clear advantage of asynchronous request processing until really high loads. A lot depends on how the JVM is handling threads and context switching. Asynchronous or thread-per-request, some session state has to be kept on the server until sending the response.

If one clear conclusion can be drawn from this, it is: Test your assumptions, get measurable metrics and asess your requirements before binding to a technology.



Sunday, February 12, 2012

Storing XML in a JSON database

Isn't any JSON database also an XML database? JSON and XML look almost equally expressive, so if I could convert between the two formats, preserving all the information, I could put my XMLs in a JSON database. Lately there are a few databases which accept JSON and let you query on JSON properties.

Googling for such a converter didn't bring me to an undisputedly stable convertor, and my fingers itching to do some Clojure programming, the choice was easily made: I had to try this with Clojure. Hers is the result: https://github.com/kolov/x2j.

The implementation is shockingly compact. I am new to Clojure and writing this compact code was by no means fast, and the code is by no means optimal. Still, it covers the tests. I am impressed wit the compactness of the result. I produce a jar with class Converter, exposing the following 3 methods:

public static String x2j(String xml);
public static String j2x(String jsonContainingOneElement);
public static String j2x(String jsonContainingManyElements, String elementName);

Let's try it with MongoDB. Define two XMLs:


 Joe
 
Main Street Atlanta
34234234324 books tv
John
Jarvis Street Atlanta
NC 679898 books film

In Java:

String xml1 = ""
                + "Joe"
                + ... ;
String xml2 = ...

Let's start. Connect to MongoDB:
Mongo m = new Mongo(SERVER, PORT);
DB db = m.getDB(DB);
boolean auth = db.authenticate(USER, PASSWORD);
DBCollection coll = db.getCollection(COLLECTION_NAME);

Write the two XMLs:

String json1 = Converter.x2j(xml1);
System.out.println("XML: " + xml1);
System.out.println("XML -> Json: " + json1);
coll.insert((DBObject) JSON.parse(json1));

String json2 = Converter.x2j(xml2);
System.out.println("XML: " + xml2);
System.out.println("XML -> Json: " + json2);
coll.insert((DBObject) JSON.parse(json2));

The log shows:

XML: Joe
Main StreetAtlanta
34234234324bookstv
XML -> Json: {"person":{"hobbies":{"hobby":["books","tv"]},"id":{"#text":"34234234324","@type":"passport"},"address":{"street":"Main Street","city":"Atlanta"},"name":"Joe"}} XML: John
Jarvis StreetAtlanta
NC 679898booksfilm
XML -> Json: {"person":{"hobbies":{"hobby":["books","film"]},"id":{"#text":"NC 679898","@type":"licence"},"address":{"street":"Jarvis Street","city":"Atlanta"},"name":"John"}}

That looks OK. Let's now query some data:

DBCursor cursorDoc = coll.find(new BasicDBObject("person.hobbies.hobby", "tv"))
while (cursorDoc.hasNext()) {
 DBObject value = cursorDoc.next();
 System.out.println("Read Json: " + JSON.serialize(value));
  . . 
}
Oops, the log:

Read Json: { "_id" : { "$oid" : "4f36d8c103648a32fb075bc0"} , "person" : { "hobbies" : { "hobby" : [ "books" , "tv"]} , "id" : { "#text" : "34234234324" , "@type" : "passport"} , "address" : { "street" : "Main Street" , "city" : "Atlanta"} , "name" : "Joe"}}

The returned Json has to elements: the person entity and some ID data I don't care about at the moment. That's the reason for the Convertor method:
public static String j2x(String jsonContainingManyElements, String elementName);

Let's filter out _id and get the non-id-element only:

private String convertDbToXml(DBObject value) {
  String json = JSON.serialize(value);
  for (String key : value.keySet()) {
  if (!key.equals("_id")) {
   return Converter.j2x(json, key);
  }
 }
 return null;
}

With this, I run two searches:
findMatches(coll, "person.hobbies.hobby", "tv");
findMatches(coll, "person.hobbies.hobby", "books");

Here's the result:

Searching: person.hobbies.hobby=tv
Read Json: { "_id" : { "$oid" : "4f36d8c103648a32fb075bc0"} , "person" : { "hobbies" : { "hobby" : [ "books" , "tv"]} , "id" : { "#text" : "34234234324" , "@type" : "passport"} , "address" : { "street" : "Main Street" , "city" : "Atlanta"} , "name" : "Joe"}}
Json -> XML: bookstv
Main StreetAtlanta
Joe
Searching: person.hobbies.hobby=books Read Json: { "_id" : { "$oid" : "4f36d8c103648a32fb075bc0"} , "person" : { "hobbies" : { "hobby" : [ "books" , "tv"]} , "id" : { "#text" : "34234234324" , "@type" : "passport"} , "address" : { "street" : "Main Street" , "city" : "Atlanta"} , "name" : "Joe"}} Json -> XML: bookstv
Main StreetAtlanta
Joe
Read Json: { "_id" : { "$oid" : "4f36d8c103648a32fb075bc1"} , "person" : { "hobbies" : { "hobby" : [ "books" , "film"]} , "id" : { "#text" : "NC 679898" , "@type" : "licence"} , "address" : { "street" : "Jarvis Street" , "city" : "Atlanta"} , "name" : "John"}} Json -> XML: booksfilm
Jarvis StreetAtlanta
John

That looks fine, I get back the XML I wrote earlier and the search on XML content worked.

How good is this approach? I don't know, an XML database should be a better choice for extensive XML storage. The conversion leaves several issues open:
- absolutely no namespace support
- Querying XML data in Json terms and not in the 'native' XPath.
Still, it is an easy way to add basic XML functionality to a JSON datastore.