> Apache has two stable modes of operation, threaded or one process per chil...

mmaunder · on Sept 12, 2011

The event mpm is experimental: http://httpd.apache.org/docs/2.2/mod/event.html

Recent benchmarks show apache is a dog with high concurrency, even with the event mpm.

http://nbonvin.wordpress.com/2011/03/14/apache-vs-nginx-vs-v...

Not sure about your nginx criticism. It's the darling of many high traffic production sites. Here's my weekly netstat off a low-end dell front end box on a gigabit link:

http://i.imgur.com/8fTg0.png

(The bottom right number is the interesting one)

My nginx config is pretty stock - standard reverse proxy to apache.

CDN's are gods gift to latency and I use several, but now and then you need an actual web server to do actual work.

saurik · on Sept 12, 2011

Another (unrelated) point: I just looked at the nbonvin benchmark, and it has some serious issues.

First, he misconfigured Apache for this environment: by having StartServers lower than the expected minimal process concurrency, they are goading Apache into constantly spawning and shutting down backends (I had similarly spiky performance until I realized this a while back).

However, the setting that simply damns this benchmark is that he has MaxClients set to 150... nginx's equivalent setting (worker_processes) is set to 1024. In essence, he needlessly hobbled Apache, and if Apache does well at all in comparison to nginx, it is probably because the benchmark is flawed. Seriously: this setup is so bad that Apache was returning 503 errors (this is mentioned in the conclusion area) because its configuration told it to stop responding to these incoming requests (and yet, he didn't bother considering that worth examining: he just tossed the detail out there as if it was Apache's fault).

And, given that Apache was less than half as bad (and depending on whether he counted 503 responses in his numbers, possibly "almost as good"), we then do go ahead and question the benchmark itself, just to find that the guy is using Apache bench... Apache bench doesn't actually claim to be very good at highly concurrent testing... specifically, it isn't actually good at swamping remote servers as it isn't itself very efficient. As one random backed-up example, from the BUGS section of the ab man page:

"""The rather heavy use of strstr(3) shows up top in profile, which might indicate a performance problem; i.e., you would measure the ab performance rather than the server's."""

Really, this website's results are based on a kind of "toy" benchmark, and should not really be trusted: the guy (as mentioned in his comments on his post), was trying to go for "the default setup" on these systems, but the default setup is not tuned for performance (this is especially the case with Apache, where distributions expect people to install it on almost everything, including your calculator). I mean, even the nginx settings should be tweaked: in a high-concurrency environment, 64/10096 (where I will admit I haven't spent much time tuning the ratio) would be a much better choice than 1/1024 (the Ubuntu default).

(Reading more of the comments on the benchmark, you can see that other people commented on some of these problems and more, and even wrote entire blog posts responses ;P.)

saurik · on Sept 12, 2011

I am not arguing that the documentation calls the module "experimental", but if you follow the discussion by the developers about this module you will find out that the fact that it is still called this is due only to a combination of 1) the major version of Apache not having been bumped in a very long time, 2) a number of Apache modules in the wild that are poorly coded (not that I've ever managed to actually find one), and most importantly: 3) it does not work on all platforms (Linux is great). Things that have previously been considered "the reason" mpm_event is marked experimental are all now obsolete; for a specific example, SSL now works 100% correctly with mpm_event.

Also, nginx being a "darling of many high traffic production sites" does not mean it actually works well for this purpose: if you do a Google search for "nginx" and "ephemeral" you get lots of evidence to the contrary, and you can also prove my statements from first principals of TCP if you really don't believe me: this need not be based on silly anecdotes, you would simply expect nginx to have issues with ephemeral ports due to the way it is designed and implemented (a reverse proxy making new outgoing connections for each incoming one), and if it didn't you would be surprised and probably want to publish a paper on it. ;P

""Compared to putting tornado processes behind nginx, this approach is simpler (since fdserver is much simpler than configuring nginx/haproxy) and avoids issues with ephemeral port limits that can be a problem in high-traffic proxied services.""" -- http://tornadogists.org/1073945/

"""This makes load testing complicated since the nginx machine quickly runs out of ephemeral ports.""" -- http://mailman.nginx.org/pipermail/nginx/2008-February/00352...

My service has tens of millions of users distributed worldwide, making many billions of requests per month to my hostnames. My setup is mostly coded for mod_python (generally considered to be an "older module", especially considering it is no longer even maintained by the upstream developers). I make complex usage of requests making recurrent subrequests through different languages. A good amount of my traffic is SSL.

Of course, most requests are cached at the CDN, so they don't have to go through to my backends, but I still handle way more than a billion requests all the way through to my dynamic webapp every month. These are all handled, eventually, by two boxes running Apache, and I only need two boxes because I want to handle one of them randomly failing (I can easily handle the load on one box: each box can handle, and actually has under previous concepts for my architecture, 3200 concurrent clients).

As for mpm_event in this environment? It works, is stable, is why I could handle 3200 concurrent clients, and you should not be avoiding it because you feel it is "experimental" (yes, even with mod_python). I did run across one or two Linux kernel builds that had regressions that affected Apache+mpm_event (horrible concurrent performance), but you are better off noticing that and steering away from them than avoiding mpm_event.

That said, I want to make it clear that I am not arguing against reverse proxies: I am only making the point that your CDN /is/ a reverse proxy, so there's little point in additionally adding nginx to the setup unless you can't handle enough concurrent connections from the master CDN nodes around the world, in which case what you really want is "just" a load balancer, and you really still want one that is smart enough to use HTTP/1.1 to connect to its backends, and that simply isn't nginx. (Humorously, DNS round-robin, if you think of it as a load balancer, actually works great for this HTTP/1.1 problem, but there are other reasons to avoid it, of course. ;P)

(Now, this said, I heard a few days ago that the just released nginx 1.1 branch now supports persistent backend connections, but I haven't been able to find it in the release notes.)

(Also, as your comment about "now and then you need an actual web server to do actual work" implies to me, but this might totally be incorrect, that you didn't yet notice that a CDN actually provides insanely high latency benefits even if all of your content is dynamic and all of it has to go through to the backend. If you did not know this, you should read my commentary here: http://news.ycombinator.com/item?id=2823268 .)