brush clojure

Monday, June 25, 2012

Asynchronous long-running requests with Servlet 3.0 and Grizzly

Maintaing a lot of open connections on a sever can be expensive, certainly in the traditional thread-per-connection model. When the server is waiting for some event (disk IO, network etc.) before sending a response, a lot of resources are reserved for doing very little.In the traditional thread-per-request model, a thread mainly holds the state of the connection. I am building a service which will take a long to respond to each request because it waits on external process each time, so before diving into a technology I took the time to measure the performance of three possible approaches: traditional 2.5 servlet, 3.0 Asynchronous servlet and NIO (Asynchronous IO) with Grizzly.

The Setup

A request in the form http://{host}/hi?wait=100 causes the server to wait for the given amount of milliseconds before returning a response. I did some measurements on three solutions to see how they actually respond to a big concurrent load.

2.x servlet

This is pretty straightforward:
 
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { 
  long waitTime = parseWaitTime(request);
  slowOperation(waitTime);

  response.setContentType("text/plain");
  PrintWriter out = response.getWriter();
  out.println("waited " + waitTime + "ms");
}
The slowOperation could consist of simple sleep, but as for the asynchronous implementations use a scheduled executor, I provided a similar implementation here to make the comparison fairer.
 
private final ScheduledExecutorService ses = Executors.newScheduledThreadPool(16);
private void slowOperation(long time) {
  final Semaphore m = new Semaphore(0);

  ses.schedule(new Runnable() {
      @Override
      public void run() { m.release(); }
    }, time, TimeUnit.MILLISECONDS);

  try {
    m.acquire();
  } catch (InterruptedException e) {
    System.out.println("never happens");
  }
}

3.0 servlet

The 3.0 specification allow to exit doGet without writing a response. The saved AsyncContext can be used later to do that.
 
@WebServlet(urlPatterns = {"/hi"}, asyncSupported = true)
public class SlowAsyncServlet extends HttpServlet {
    private final ScheduledExecutorService ses = Executors.newScheduledThreadPool(16);

    public void doGet(HttpServletRequest request, HttpServletResponse response) {
        final AsyncContext ac = request.startAsync();
        final long finalTime = parseWaitTime(request);
        ses.schedule(new Runnable() {
            @Override
            public void run() {
                try {
                    ac.getResponse().getWriter().write("waited " + finalTime + "ms");
                } catch (IOException e) {
                    System.out.println("Error");
                }
                ac.complete();
            }
        }, finalTime, TimeUnit.MILLISECONDS);
    }
}

Grizzly

Grizzly is a server framework which allows to take advantage of the Java NIO without handling all details. Grizzly is not limited to a HTTP server, it is supported next to other implementations. I took the example at http://grizzly.java.net/nonav/docs/docbkx2.0/html/httpserverframework-samples.html as starting point and ended with the following HTTPHandler:
public class NonBlockingEchoHandler extends HttpHandler { 

    private final ScheduledExecutorService ses = Executors.newScheduledThreadPool(16);

    private static int count = 0;

    @Override
    public void service(final Request request,
                        final Response response) throws Exception {

        final char[] buf = new char[128];
        final NIOReader in = request.getReader(false); // put the stream in non-blocking mode
        final NIOWriter out = response.getWriter();
        final long waitTime = parseWaitTime(request);
        response.suspend();
 
        in.notifyAvailable(new ReadHandler() {

            @Override
            public void onDataAvailable() throws Exception {
                in.notifyAvailable(this);
            }

            @Override
            public void onError(Throwable t) {
                System.out.println("[error]" + t);
                response.resume();
            }

            @Override
            public void onAllDataRead() throws Exception {
                 ses.schedule(new Runnable() {
                    @Override
                    public void run() {
                        try {
                            out.write("waited " + waitTime + "ms");
                        } catch (IOException e) {
                            System.out.println("Error");
                            return;
                        } finally {
                            try {
                                in.close();
                            } catch (IOException ignored) {
                            }
                            try {
                                out.close();
                            } catch (IOException ignored) {
                            }
                            response.resume();
                        }
                    }
                }, waitTime, TimeUnit.MILLISECONDS);
            }
        });
      }

The Measurement

Nothing too exciting, just running ab with different number of concurrent request like this:
ab -r -g servlet2.5-c100 -n 6000 -c 100 http://127.0.0.1:8080/servlet2.5/hi?wait=100
Response times are logged in a file used by gnuplot to generate the diagrams below. I deployed the servlets under Jetty 8.1.3, and started the grizzly server on a Macbook Pro with i5 and 8GB RAM. The only VM parameter was -Xss256k, to keep thread memory usage low. While running the measurements, I also observed the JVM threads an memory usage. The thread usage result are close to the expectations, by 200 concurrent request Jetty was using 54 threads initially, which went up to 261 for servlet 2.5 and to 100 for servlet 3.0. Grizzly started with 39 threads which went up to 55.

The Result

The result? The measurements were not exactly matching my expectations. I expected a clear difference in the performance which only started to appear by very high loads, while performance degradation was noticeable in all three setups. The diagrams below show how response time is distributed among 6,000 requests, fired in concurrent batches with different sizes. Ideally, all request would be processed in 100ms and the result would be three horizontal lines at 100ms. With 10 concurrent requests, as expected, all 3 servers behave just fine. The few request finished faster than 100ms indicate some flaw in the ab result, I am not sure what causes it.

With 100 concurrent request the three implementations still go close together:

By 200 concurrent requests the three servers start showing a different behavior...

more and more so with 500 and 1000 concurrent request, see below.

By high loads both servlets are responding well in most cases, with some percentage of the requests taking really long to process. Grizzly's response time deviation stays low, with very few requests taking really long.

Apart from the threads usage, the results don't show the clear advantage of asynchronous request processing until really high loads. A lot depends on how the JVM is handling threads and context switching. Asynchronous or thread-per-request, some session state has to be kept on the server until sending the response.

If one clear conclusion can be drawn from this, it is: Test your assumptions, get measurable metrics and asess your requirements before binding to a technology.