The Setup
A request in the form http://{host}/hi?wait=100 causes the server to wait for the given amount of milliseconds before returning a response. I did some measurements on three solutions to see how they actually respond to a big concurrent load.2.x servlet
This is pretty straightforward:protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { long waitTime = parseWaitTime(request); slowOperation(waitTime); response.setContentType("text/plain"); PrintWriter out = response.getWriter(); out.println("waited " + waitTime + "ms"); }The slowOperation could consist of simple sleep, but as for the asynchronous implementations use a scheduled executor, I provided a similar implementation here to make the comparison fairer.
private final ScheduledExecutorService ses = Executors.newScheduledThreadPool(16); private void slowOperation(long time) { final Semaphore m = new Semaphore(0); ses.schedule(new Runnable() { @Override public void run() { m.release(); } }, time, TimeUnit.MILLISECONDS); try { m.acquire(); } catch (InterruptedException e) { System.out.println("never happens"); } }
3.0 servlet
The 3.0 specification allow to exit doGet without writing a response. The saved AsyncContext can be used later to do that.@WebServlet(urlPatterns = {"/hi"}, asyncSupported = true) public class SlowAsyncServlet extends HttpServlet { private final ScheduledExecutorService ses = Executors.newScheduledThreadPool(16); public void doGet(HttpServletRequest request, HttpServletResponse response) { final AsyncContext ac = request.startAsync(); final long finalTime = parseWaitTime(request); ses.schedule(new Runnable() { @Override public void run() { try { ac.getResponse().getWriter().write("waited " + finalTime + "ms"); } catch (IOException e) { System.out.println("Error"); } ac.complete(); } }, finalTime, TimeUnit.MILLISECONDS); } }
Grizzly
Grizzly is a server framework which allows to take advantage of the Java NIO without handling all details. Grizzly is not limited to a HTTP server, it is supported next to other implementations. I took the example at http://grizzly.java.net/nonav/docs/docbkx2.0/html/httpserverframework-samples.html as starting point and ended with the following HTTPHandler:public class NonBlockingEchoHandler extends HttpHandler { private final ScheduledExecutorService ses = Executors.newScheduledThreadPool(16); private static int count = 0; @Override public void service(final Request request, final Response response) throws Exception { final char[] buf = new char[128]; final NIOReader in = request.getReader(false); // put the stream in non-blocking mode final NIOWriter out = response.getWriter(); final long waitTime = parseWaitTime(request); response.suspend(); in.notifyAvailable(new ReadHandler() { @Override public void onDataAvailable() throws Exception { in.notifyAvailable(this); } @Override public void onError(Throwable t) { System.out.println("[error]" + t); response.resume(); } @Override public void onAllDataRead() throws Exception { ses.schedule(new Runnable() { @Override public void run() { try { out.write("waited " + waitTime + "ms"); } catch (IOException e) { System.out.println("Error"); return; } finally { try { in.close(); } catch (IOException ignored) { } try { out.close(); } catch (IOException ignored) { } response.resume(); } } }, waitTime, TimeUnit.MILLISECONDS); } }); }
The Measurement
Nothing too exciting, just running ab with different number of concurrent request like this:ab -r -g servlet2.5-c100 -n 6000 -c 100 http://127.0.0.1:8080/servlet2.5/hi?wait=100Response times are logged in a file used by gnuplot to generate the diagrams below. I deployed the servlets under Jetty 8.1.3, and started the grizzly server on a Macbook Pro with i5 and 8GB RAM. The only VM parameter was -Xss256k, to keep thread memory usage low. While running the measurements, I also observed the JVM threads an memory usage. The thread usage result are close to the expectations, by 200 concurrent request Jetty was using 54 threads initially, which went up to 261 for servlet 2.5 and to 100 for servlet 3.0. Grizzly started with 39 threads which went up to 55.
The Result
The result? The measurements were not exactly matching my expectations. I expected a clear difference in the performance which only started to appear by very high loads, while performance degradation was noticeable in all three setups. The diagrams below show how response time is distributed among 6,000 requests, fired in concurrent batches with different sizes. Ideally, all request would be processed in 100ms and the result would be three horizontal lines at 100ms. With 10 concurrent requests, as expected, all 3 servers behave just fine. The few request finished faster than 100ms indicate some flaw in the ab result, I am not sure what causes it.With 100 concurrent request the three implementations still go close together:
By 200 concurrent requests the three servers start showing a different behavior...
more and more so with 500 and 1000 concurrent request, see below.
By high loads both servlets are responding well in most cases, with some percentage of the requests taking really long to process. Grizzly's response time deviation stays low, with very few requests taking really long.
Apart from the threads usage, the results don't show the clear advantage of asynchronous request processing until really high loads. A lot depends on how the JVM is handling threads and context switching. Asynchronous or thread-per-request, some session state has to be kept on the server until sending the response.
If one clear conclusion can be drawn from this, it is: Test your assumptions, get measurable metrics and asess your requirements before binding to a technology.
Hi,
ReplyDeleteNice blog! Is there an email address I can contact you in private?
Thanks, Nikos. I just added my email to the page. It is a bit tweaked to keep mail-collecting bot developers alert.
ReplyDeleteDid you try to tweak your executor? Fixed with 16 threads obviously cannot scale to 10000 concurrent request. Maybe the numbers will change if you play around with it.
ReplyDeleteAlso i think it is better practice to initialize it in an app startup/shutdown listener, though this would not affect the numbers