While Nxweb looks very promising, my first question would be 'Why should I use it over eg Nginx?' It would be helpful to have some direct comparison to other servers on the landing page.
EDIT: Ok, there is a link to some odd benchmarks and it includes performance comparisons to Nginx and others which are not understandable (Nginx 141 req/s and Nxweb 200 / 121 req/s while it's not clear when 200 and when 121); moreover they compare it to Mongoose which is an ORM/ODM
Those benchmarks are well and great but with HTTP/2 support in nginx, do they even matter? I'm reasonably certain that nginx will out-perform for the average client in real-world, TLS-enabled testing.
I only spent a minute on the benchmark page, but it says quite clearly the reqs/s are given in the thousands, and explains each test, as well as the conditions.
However, that doesn't change the why question at all. Except it could be neat to not need the complication of setting up nginx with uwsgi for those who like the built-in Python/WSGI support.
They discount using CGI, which is fair enough, but why not use FastCGI? It's a sensible enough protocol, there are libraries for most languages and there's a good chance that your existing web server supports it.
Technically, there's no good reason why a FastCGI based system would be significantly slower than a custom reimplementation like this.
I'd imagine encoding/decoding will be really insignificant compared to generating the requested page or fetching data from a database in most, if not all, cases
Most websites do not see more than 100 requests per second.
In those cases you are correct: parsing and de-parsing is insignificant compared to the amount of energy the computer is using to heat the room.
However in order to do a trillion requests per day you need around 30 machines using a custom web server, or 300 machines using Fastcgi: In this situation the cost is an order of magnitude.
Many people observe that as miles-per-gallon gets better and better, it begins to become a deceptive measurement in a way, because going from 10 to 20 mpg is a much, much larger change than going from 30 to 40, or even from 80 to 140. It seems people get a better sense of what's going on to measure gallons per mile. When you start doing that it becomes more clear that going from .0001 gallons per mile to .00001 gallons per mile, as large as it may be in orders of magnitude, still isn't that big a deal. Either way you're looking at your cost-of-fuel being effectively zero for all practical use cases, because your costs will be dominated by something else.
Similarly, I've noticed that people tend to get a little silly about web server requests-per-second. It really gets to the point you probably ought to be talking about seconds per request, or perhaps rather, microseconds per request or something.
Because A: as you start talking about these fast servers, you need to contemplate whether your code can run in, say, 2.5 microseconds either; who cares whether your webserver takes 2 or 25 microseconds to handle a minimal request if your minimal response requires 8 milliseconds (i.e. "8000 microseconds")? 8ms would actually be pretty decent performance for a wide variety of non-trivial web requests.
And B: As the webservers get faster and faster, you really need to start wondering what corners they cut to push their reqs/s number up. I can make a blazingly fast webserver that would actually kill nginx's performance stone dead for a "return a constant JSON string response" task... the trick is that I'm not even going to look at the incoming web request, I'm going to just receive a socket, blast out my answer as a constant string buffer without even reading from the socket, and discard the socket. (If you're feeling particularly saucy, hook that up to a user-space TCP stack so you can drop the work of properly setting up and tearing down TCP connections.) There aren't that many real-world tasks for which that is a good solution (though, non-zero!), but it'll look like pure awesomesauce on the benchmark!
Properly handling HTTP is non-trivial problem, and even moreso if it's going to be hooked up to a program rather than a static file system or something similarly easy. I actually start getting nervous about web servers that show excessively high numbers. If your performance is much better than nginx, rather than me cheering for joy, I actually have a lot of questions about how you did that exactly, and what my website's security profile looks like with your way-faster server. I'm not saying these questions are completely unanswerable; perhaps there is a way to safely do a much faster web server. I'm just saying that rather than my default response being celebration and "Oh wowzers cool!", my default reaction is a healthy dollop of skepticism.
My web server, filed ( http://filed.rkeene.org/ ), is faster than nginx for serving static content by doing two things:
1. Only handling static files
2. Being extremely optimized for serving static content
It's very safe as far as I can tell having run it under AFL with no crashes with ASan on as well as having run it in production on the public Internet.
A few of the optimizations I do in "filed" could also be done in nginx, but most would cost too much.
A separate logging thread that is queued to helps a lot and was one of the main reasons for writing "filed" -- my ability to serve files was being slowed by my ability to write logs indicating that I had served something. The downside is that there may be a large queue of unwritten logs in the event of a kernel panic it other unexpected process termination.
Most requests don't even open the file they are serving because "filed" caches open file descriptors -- once the file has been opened it's kept open until cache entry is needed for a newer file.
There are no runtime allocations after startup except for log entries, leading to very consistent performance under loads.
Re: gallons of gas. There's the old puzzle: your spouse gets 100MPG in that super-hybrid-mobile. The salesperson wants to upgrade you for $1000 to the super-duper-hybrid-mobile at 200MPG! Double the mileage!
You suggest instead that you get the old truck serviced and replace the plugs, distributor and tailpipe. Estimated cost $1000, and should get you from 10MPG to 11MPG. Which is the better deal? Assuming you both drive about 100 miles per week.
Isn't it precisely like that? The point of the exercise is that even when you are getting really high mpg changes (e.g. 100 to 120), the best gain is improving the the really slow component of the pipeline (e.g. the truck from 10 to 12).
That is a very good point, but it's not the argument here: I'm responding specifically to the idea that webserver and webserver+fastcgi are "technically" the same speed.
Running two web servers (one speaking HTTP and one speaking FastCGI) is necessarily going to be slower than running one web server.
This should be obvious, although it might be "not significantly slower", which is why I provided some real numbers from my experience to show at which point it becomes slower by an order of magnitude.
You might also find that it's easier to debug one webserver than two.
I don't see how it can become an order of magnitude slower even if your app is effectively a no-op.
Two extra fast cgi encodes and decodes sounds like the whole pipeline is just doing 3x the serialization of a non-fast cgi one. Am I missing some context switching overhead you are implying or something?
If you managed to get a trillion requests per day with a "hello world" scenario, you're probably also able to get 270 machines for free from your gullible incubator.
Most ad servers deliver static or hello-world-style content, doing no database lookups, but logging their results.
RTB systems have about 30-100msec for the entire transaction (and that includes network to the user), so you need better control of your latency anyway.
https://github.com/facebook/proxygen
C++, used by Facebook in production.
We have been using it in a high performance RTB application and has performed remarkably well.
e.g. node.js has no problem getting 20k/sec per core, but a stall at the wrong time kills every pipelined HTTP request that follows (until you tear down the connection and restart it).
Worth also mentioning that ad servers tend to have massive RAM requirements (again, for speed). GC in a 30GB JVM can take 10 minutes. To handle it, companies mark the boxes as 'inoperable' when they are in GC mode and remove them from the cluster until thy are ready to return.
All of this is motivation to rewrite everything in C.
> 40k/sec http requests (no pipelining) per core is about as fast as it gets unless you move TCP into user space.
I think it can be micro-optimized beyond that. With predictions to avoid unnecessary syscalls, with syscalls grouped together to make cpu more efficient for the rest of the time it spends in event loop, and if it's possible to modify kernel a bit - with batching syscalls together to make them very cheap.
At some point, your complexity gets bigger than simply coding a state machine that operates directly on the network buffers themselves:
That is to say, I suspect that if micro-optimisations can double our performance, they will be more complicated than just writing a customised ring0 that implements HTTP directly inside the network driver.
Here is how I'm looking at it:
• 10Gb/sec network port
• 4k max requests and responses
• == 1.3 million HTTP requests per second.
Now the problem is that main memory is not much faster than our fastest network: About 15Gb/sec, so what we're talking about here is code and state staying entirely in L1, and streaming the network buffers across the CPU, and responding in one pass, to get that 1.3 million optimal performance.
My dash server gets ~135k HTTP requests per second on localhost (I should be able to approach 300k/sec over a network if I ever get around to it). That's 22% of our optimal performance, and a lot better than any other HTTP server I'm aware of.
At this speed, one of those micro-optimisations `writev()` is actually slower than `write()` -- likely because the code path is shorter in the simpler codebase -- but it illustrates my concern nicely: That we are close to that break-even point with the optimisations we can make. If we make our server bigger and more complicated, it might not make our programs any faster.
That suggests to me that the solution is actually fewer, simpler syscalls, not more, bigger ones.
What kind of "main memory" are you talking about? Regular, consumer grade memory, will have a bandwidth at least ten times faster than your 10Gb/s network interface. Change the bit to a byte and you're a little closer.
I work at two ad-tech related companies, one large one medium, both use Java from the beginning. And I know other company using Python/Golang as well.
Didn't know C(not C++) is particular popular until today. Note that ad company are pretty business focused, add or remove features for big clients are pretty common, so development efficiency matters a lot.
We went from C++ to Go, and started to auto-lose any auctions whose bid requests were in progress during a GC cycle. Go 1.1/1.2 had atrociously long GC pauses (sometimes up to 500ms). You are correct that development efficiency was very important, so losing the auctions was justifiable.
Did you have tight and consistent latency requirements?
GC pauses are a killer. They can be worked around, but it takes intensive tuning.
The RTB bidder I wrote many moons ago at a startup was fast as hell, but had problems in the 95th percentile of requests meeting the latency targets, due to GC.
The ad servers at the big ad tech heavy weights are in C++.
Of course, people aren't going to be happy when they packets get dropped or the network monitoring software wasn't able to provide a (soft) real time view of what was happening.
Also note I wasn't doing this alone, it was a very big project a mobile operator.
Your experience isn't representative, so there were probably other confounding factors, like the C++ wasn't very well written, or you weren't dealing with petabytes of data or something like that.
Given that I remember the days when C and C++ compilers generated code worse than a junior Assembly programmer, I always find such comparisons interesting.
Not that they aren't true, rather their validaty depends a lot of programmer skillset and compilers being used.
Technical prowess is one thing, support is another. There are 20 times as many openresty questions (although still very few) on stackoverflow than nxweb and the few nxweb questions there are from years ago. I am not sure why is this on hacker news frontpage suddenly.
Which Python is supported? Python 2 or 3? For some it makes a big difference. I really want to play with this, also what OS? Only Linux? I am trying to find it on the site but I'm not seeing it, maybe adding it on the front page or in an FAQ would help (requires creating an FAQ page or section). Thanks! Looks interesting otherwise.
The templating engine is an interesting, and slightly curious, addition here.
It looks significantly more flexible than anything nginx offers without having to bolt on a server-side language like PHP - unless nginx has something similar in its millions of modules that I'm not aware of.
(I know about and love nginx SSIs, but the templating here looks more flexible than them.)
That is inaccurate. But if you said that it comes from C memory management issues, that sounds more plausible. We could talk about Rust, but I even think that C++ is usually written in a much safer style than C.
Seems cool, but curious if teams have investigated golang for these use cases, specifically if throughput is sufficiently high and the GC pauses are sufficiently small.
While Nxweb looks very promising, my first question would be 'Why should I use it over eg Nginx?' It would be helpful to have some direct comparison to other servers on the landing page.
EDIT: Ok, there is a link to some odd benchmarks and it includes performance comparisons to Nginx and others which are not understandable (Nginx 141 req/s and Nxweb 200 / 121 req/s while it's not clear when 200 and when 121); moreover they compare it to Mongoose which is an ORM/ODM