Nxweb – Fast and Lightweight Web Server

glossyscr · on Feb 27, 2016

Why?

While Nxweb looks very promising, my first question would be 'Why should I use it over eg Nginx?' It would be helpful to have some direct comparison to other servers on the landing page.

EDIT: Ok, there is a link to some odd benchmarks and it includes performance comparisons to Nginx and others which are not understandable (Nginx 141 req/s and Nxweb 200 / 121 req/s while it's not clear when 200 and when 121); moreover they compare it to Mongoose which is an ORM/ODM

sanxiyn · on Feb 27, 2016

Mongoose is a web server. See https://github.com/cesanta/mongoose. It's obviously not https://github.com/Automattic/mongoose.

glossyscr · on Feb 27, 2016

Thanks, first time I hear about a Mongoose web server.

evthewolf · on Feb 28, 2016

One name two applications

rdw · on Feb 27, 2016

Took me a bit of searching to figure out the 200/121 numbers. The answer: "first measurement is for inprocess handler, second is for inworker handler"

westi · on Feb 27, 2016

The benchmarks also appear to compare to nginx v1.1.12 which is from 2011, not the latest nginx

chrissnell · on Feb 27, 2016

Those benchmarks are well and great but with HTTP/2 support in nginx, do they even matter? I'm reasonably certain that nginx will out-perform for the average client in real-world, TLS-enabled testing.

rand1012 · on Feb 27, 2016

I want to use it on Raspberry Pi

bobwaycott · on Feb 27, 2016

I only spent a minute on the benchmark page, but it says quite clearly the reqs/s are given in the thousands, and explains each test, as well as the conditions.

However, that doesn't change the why question at all. Except it could be neat to not need the complication of setting up nginx with uwsgi for those who like the built-in Python/WSGI support.

susi22 · on Feb 27, 2016

If CloudFlare can handle many thousands of sites [1] with nginx+lua then I'm not sure if it's worth it to go the C route.

[1] https://groups.google.com/d/msg/openresty-en/aoBL22H8fP4/bJ3...

jedisct1 · on Feb 27, 2016

H2O is also written in C, is also easy to embed, and supports HTTP/2. https://h2o.examp1e.net/

joosters · on Feb 27, 2016

They discount using CGI, which is fair enough, but why not use FastCGI? It's a sensible enough protocol, there are libraries for most languages and there's a good chance that your existing web server supports it.

Technically, there's no good reason why a FastCGI based system would be significantly slower than a custom reimplementation like this.

geocar · on Feb 27, 2016

> Technically, there's no good reason why a FastCGI based system would be significantly slower than a custom reimplementation like this.

An HTTP server speaking to a FastCGI application will:

• read the HTTP message

• decode HTTP

• encode FastCGI

• write to application

The FastCGI application will then:

• read the FastCGI message

• decode the FastCGI message

• do application stuff

• encode the FastCGI response

• write to the web server

The webserver then resumes:

• reading the FastCGI response

• decoding the FastCGI response

• writing the HTTP response

Meanwhile, an in-process HTTP server (like nxweb) system will simply:

• read the HTTP message

• decode HTTP

• do application stuff

• write the HTTP response

Less code runs faster; it is obvious to me why this is faster.

iMerNibor · on Feb 27, 2016

I'd imagine encoding/decoding will be really insignificant compared to generating the requested page or fetching data from a database in most, if not all, cases

geocar · on Feb 27, 2016

Most websites do not see more than 100 requests per second.

In those cases you are correct: parsing and de-parsing is insignificant compared to the amount of energy the computer is using to heat the room.

However in order to do a trillion requests per day you need around 30 machines using a custom web server, or 300 machines using Fastcgi: In this situation the cost is an order of magnitude.

jerf · on Feb 27, 2016

Many people observe that as miles-per-gallon gets better and better, it begins to become a deceptive measurement in a way, because going from 10 to 20 mpg is a much, much larger change than going from 30 to 40, or even from 80 to 140. It seems people get a better sense of what's going on to measure gallons per mile. When you start doing that it becomes more clear that going from .0001 gallons per mile to .00001 gallons per mile, as large as it may be in orders of magnitude, still isn't that big a deal. Either way you're looking at your cost-of-fuel being effectively zero for all practical use cases, because your costs will be dominated by something else.

Similarly, I've noticed that people tend to get a little silly about web server requests-per-second. It really gets to the point you probably ought to be talking about seconds per request, or perhaps rather, microseconds per request or something.

Because A: as you start talking about these fast servers, you need to contemplate whether your code can run in, say, 2.5 microseconds either; who cares whether your webserver takes 2 or 25 microseconds to handle a minimal request if your minimal response requires 8 milliseconds (i.e. "8000 microseconds")? 8ms would actually be pretty decent performance for a wide variety of non-trivial web requests.

And B: As the webservers get faster and faster, you really need to start wondering what corners they cut to push their reqs/s number up. I can make a blazingly fast webserver that would actually kill nginx's performance stone dead for a "return a constant JSON string response" task... the trick is that I'm not even going to look at the incoming web request, I'm going to just receive a socket, blast out my answer as a constant string buffer without even reading from the socket, and discard the socket. (If you're feeling particularly saucy, hook that up to a user-space TCP stack so you can drop the work of properly setting up and tearing down TCP connections.) There aren't that many real-world tasks for which that is a good solution (though, non-zero!), but it'll look like pure awesomesauce on the benchmark!

Properly handling HTTP is non-trivial problem, and even moreso if it's going to be hooked up to a program rather than a static file system or something similarly easy. I actually start getting nervous about web servers that show excessively high numbers. If your performance is much better than nginx, rather than me cheering for joy, I actually have a lot of questions about how you did that exactly, and what my website's security profile looks like with your way-faster server. I'm not saying these questions are completely unanswerable; perhaps there is a way to safely do a much faster web server. I'm just saying that rather than my default response being celebration and "Oh wowzers cool!", my default reaction is a healthy dollop of skepticism.

rkeene2 · on Feb 27, 2016

My web server, filed ( http://filed.rkeene.org/ ), is faster than nginx for serving static content by doing two things: 1. Only handling static files 2. Being extremely optimized for serving static content

It's very safe as far as I can tell having run it under AFL with no crashes with ASan on as well as having run it in production on the public Internet.

A few of the optimizations I do in "filed" could also be done in nginx, but most would cost too much.

A separate logging thread that is queued to helps a lot and was one of the main reasons for writing "filed" -- my ability to serve files was being slowed by my ability to write logs indicating that I had served something. The downside is that there may be a large queue of unwritten logs in the event of a kernel panic it other unexpected process termination.

Most requests don't even open the file they are serving because "filed" caches open file descriptors -- once the file has been opened it's kept open until cache entry is needed for a newer file.

There are no runtime allocations after startup except for log entries, leading to very consistent performance under loads.

JoeAltmaier · on Feb 27, 2016

Re: gallons of gas. There's the old puzzle: your spouse gets 100MPG in that super-hybrid-mobile. The salesperson wants to upgrade you for $1000 to the super-duper-hybrid-mobile at 200MPG! Double the mileage!

You suggest instead that you get the old truck serviced and replace the plugs, distributor and tailpipe. Estimated cost $1000, and should get you from 10MPG to 11MPG. Which is the better deal? Assuming you both drive about 100 miles per week.

zzzcpan · on Feb 27, 2016

Web servers are not like that. Micro-optimizations only work for benchmarks and very specific load patterns that almost no people have.

hueving · on Feb 27, 2016

Isn't it precisely like that? The point of the exercise is that even when you are getting really high mpg changes (e.g. 100 to 120), the best gain is improving the the really slow component of the pipeline (e.g. the truck from 10 to 12).

zzzcpan · on Feb 27, 2016

Well, no. Web server's role is more like a taxi drive home after a 12 hour flight. From that perspective MPGs don't matter at all.

geocar · on Feb 27, 2016

That is a very good point, but it's not the argument here: I'm responding specifically to the idea that webserver and webserver+fastcgi are "technically" the same speed.

Running two web servers (one speaking HTTP and one speaking FastCGI) is necessarily going to be slower than running one web server.

This should be obvious, although it might be "not significantly slower", which is why I provided some real numbers from my experience to show at which point it becomes slower by an order of magnitude.

You might also find that it's easier to debug one webserver than two.

hueving · on Feb 27, 2016

I don't see how it can become an order of magnitude slower even if your app is effectively a no-op. Two extra fast cgi encodes and decodes sounds like the whole pipeline is just doing 3x the serialization of a non-fast cgi one. Am I missing some context switching overhead you are implying or something?

mhd · on Feb 27, 2016

If you managed to get a trillion requests per day with a "hello world" scenario, you're probably also able to get 270 machines for free from your gullible incubator.

geocar · on Feb 27, 2016

Most ad servers deliver static or hello-world-style content, doing no database lookups, but logging their results.

RTB systems have about 30-100msec for the entire transaction (and that includes network to the user), so you need better control of your latency anyway.

ramr · on Feb 27, 2016

https://github.com/facebook/proxygen C++, used by Facebook in production. We have been using it in a high performance RTB application and has performed remarkably well.

ktRolster · on Feb 27, 2016

A lot of ad-tech companies build ad-servers in C, because the latency is so crucial in that context.

geocar · on Feb 27, 2016

Specifically it's garbage collection.

e.g. node.js has no problem getting 20k/sec per core, but a stall at the wrong time kills every pipelined HTTP request that follows (until you tear down the connection and restart it).

ktRolster · on Feb 27, 2016

Worth also mentioning that ad servers tend to have massive RAM requirements (again, for speed). GC in a 30GB JVM can take 10 minutes. To handle it, companies mark the boxes as 'inoperable' when they are in GC mode and remove them from the cluster until thy are ready to return.

All of this is motivation to rewrite everything in C.

paulddraper · on Feb 28, 2016

10 minutes is nowhere even close to my experience, especially because Oracle/OpenJDK has incremental GC.

ex3ndr · on Feb 27, 2016

20k/sec is a joke performance. Java/Scala with Akka handles millions of packages per sec and this is performance.

geocar · on Feb 27, 2016

40k/sec http requests (no pipelining) per core is about as fast as it gets unless you move TCP into user space.

Reports of "millions per sec" are usually talking about messages on established channels across all CPUs.

If you're actually aware of a java based web server that can beat even 150k http requests measured by wrk or similar on local host I'd like to see it.

zzzcpan · on Feb 27, 2016

> 40k/sec http requests (no pipelining) per core is about as fast as it gets unless you move TCP into user space.

I think it can be micro-optimized beyond that. With predictions to avoid unnecessary syscalls, with syscalls grouped together to make cpu more efficient for the rest of the time it spends in event loop, and if it's possible to modify kernel a bit - with batching syscalls together to make them very cheap.

geocar · on Feb 27, 2016

At some point, your complexity gets bigger than simply coding a state machine that operates directly on the network buffers themselves:

That is to say, I suspect that if micro-optimisations can double our performance, they will be more complicated than just writing a customised ring0 that implements HTTP directly inside the network driver.

Here is how I'm looking at it:

• 10Gb/sec network port

• 4k max requests and responses

• == 1.3 million HTTP requests per second.

Now the problem is that main memory is not much faster than our fastest network: About 15Gb/sec, so what we're talking about here is code and state staying entirely in L1, and streaming the network buffers across the CPU, and responding in one pass, to get that 1.3 million optimal performance.

My dash server gets ~135k HTTP requests per second on localhost (I should be able to approach 300k/sec over a network if I ever get around to it). That's 22% of our optimal performance, and a lot better than any other HTTP server I'm aware of.

At this speed, one of those micro-optimisations `writev()` is actually slower than `write()` -- likely because the code path is shorter in the simpler codebase -- but it illustrates my concern nicely: That we are close to that break-even point with the optimisations we can make. If we make our server bigger and more complicated, it might not make our programs any faster.

That suggests to me that the solution is actually fewer, simpler syscalls, not more, bigger ones.

vegardx · on Feb 27, 2016

What kind of "main memory" are you talking about? Regular, consumer grade memory, will have a bandwidth at least ten times faster than your 10Gb/s network interface. Change the bit to a byte and you're a little closer.

pjmlp · on Feb 27, 2016

Or they don't know any better.

I took part in a few projects that replaced high throughput servers handling mobile network traffic from C++ to Java.

eva1984 · on Feb 27, 2016

Can confirm.

I work at two ad-tech related companies, one large one medium, both use Java from the beginning. And I know other company using Python/Golang as well.

Didn't know C(not C++) is particular popular until today. Note that ad company are pretty business focused, add or remove features for big clients are pretty common, so development efficiency matters a lot.

brobinson · on Feb 27, 2016

We went from C++ to Go, and started to auto-lose any auctions whose bid requests were in progress during a GC cycle. Go 1.1/1.2 had atrociously long GC pauses (sometimes up to 500ms). You are correct that development efficiency was very important, so losing the auctions was justifiable.

iampims · on Feb 27, 2016

Are things different with Go 1.5/1.6 now that GC has start to be optimized?

elithrar · on Feb 28, 2016

As of Go 1.5 (1.6, released last week, makes further improvements):

> The "stop the world" phase of the collector will almost always be under 10 milliseconds and usually much less.

https://golang.org/doc/go1.5

brobinson · on Feb 27, 2016

Not sure, sorry. I haven't written any high-performance stuff in it since 2014.

cmrdporcupine · on Feb 27, 2016

Did you have tight and consistent latency requirements?

GC pauses are a killer. They can be worked around, but it takes intensive tuning.

The RTB bidder I wrote many moons ago at a startup was fast as hell, but had problems in the 95th percentile of requests meeting the latency targets, due to GC.

The ad servers at the big ad tech heavy weights are in C++.

pjmlp · on Feb 27, 2016

Of course, people aren't going to be happy when they packets get dropped or the network monitoring software wasn't able to provide a (soft) real time view of what was happening.

Also note I wasn't doing this alone, it was a very big project a mobile operator.

ktRolster · on Feb 27, 2016

Your experience isn't representative, so there were probably other confounding factors, like the C++ wasn't very well written, or you weren't dealing with petabytes of data or something like that.

faint_coder · on Feb 27, 2016

Or maybe THEY don't know any better way to optimize/use C++ instead of switching to Java

pjmlp · on Feb 27, 2016

Given that I remember the days when C and C++ compilers generated code worse than a junior Assembly programmer, I always find such comparisons interesting.

Not that they aren't true, rather their validaty depends a lot of programmer skillset and compilers being used.

goldenkey · on Feb 27, 2016

Java is garbage unless you specific 20 -D options to bend it to your will.

pjmlp · on Feb 27, 2016

I could say the same thing of compiler optimisation flags.

goldenkey · on March 5, 2016

Pretty sure -O3 and we are set. False dichotomy. Java has a lot of users but sheep still get slaughtered.

_ugfj · on Feb 27, 2016

Technical prowess is one thing, support is another. There are 20 times as many openresty questions (although still very few) on stackoverflow than nxweb and the few nxweb questions there are from years ago. I am not sure why is this on hacker news frontpage suddenly.

giancarlostoro · on Feb 27, 2016

Which Python is supported? Python 2 or 3? For some it makes a big difference. I really want to play with this, also what OS? Only Linux? I am trying to find it on the site but I'm not seeing it, maybe adding it on the front page or in an FAQ would help (requires creating an FAQ page or section). Thanks! Looks interesting otherwise.

sanxiyn · on Feb 27, 2016

Python 2, since https://bitbucket.org/yarosla/nxweb/src/tip/src/lib/modules/... uses PyInt_FromLong, which was replaced by PyLong_FromLong in Python 3. On the other hand, it doesn't look hard to port.

hayksaakian · on Feb 27, 2016

it says

> Limitations: > - only tested on Linux

RUG3Y · on Feb 27, 2016

Looks cool. Must have Python 3 for me to use, would definitely try it out if it's supported.

amelius · on Feb 27, 2016

Does it support HTTP 2, or will it in the future?

mp3geek · on Feb 27, 2016

https://groups.google.com/forum/?hl=en#!topic/nxweb/8NAnQ0Im...

Unlikely, and given his attitude I'm not going to waste my time trying nxweb.

lox · on Feb 27, 2016

> "No plans so far. Why whould you need it?

Yeah, nope. Check out H2O if you haven't already https://h2o.examp1e.net/.

thenomad · on Feb 27, 2016

The templating engine is an interesting, and slightly curious, addition here.

It looks significantly more flexible than anything nginx offers without having to bolt on a server-side language like PHP - unless nginx has something similar in its millions of modules that I'm not aware of.

(I know about and love nginx SSIs, but the templating here looks more flexible than them.)

iso-8859-1 · on Feb 27, 2016

Other than being C and not C++, how does it compare to CppCMS (not a CMS)? http://cppcms.com/wikipp/en/page/main

ex3ndr · on Feb 27, 2016

Some questions:

1) Why you think that java is slower than C++? Server-side JIT compiles much more optimized code as it is really know what and how to optimize.

2) What about security? Almost half of the problems in security in last days came from native code stuff.

Ace17 · on Feb 27, 2016

"Server-side JIT compiles much more optimized code as it is really know what and how to optimize."

While this seems perfectly plausible, would you happen to know some benchmark backing this claim? Thanks.

iso-8859-1 · on Feb 27, 2016

> half of the problems

citation needed

> native code stuff

That is inaccurate. But if you said that it comes from C memory management issues, that sounds more plausible. We could talk about Rust, but I even think that C++ is usually written in a much safer style than C.

22klinda · on Feb 27, 2016

I would like to see how good it perform against a webserver like cowboy.

arca_vorago · on Feb 27, 2016

Im curious about security features, which are one of the main reasons I have been using Hiawatha.

elcct · on Feb 27, 2016

I remember playing with it some time ago. Pretty cool thing.

known · on Feb 27, 2016

Good initiative;

niksmac · on Feb 27, 2016

I am so glad to see nginx is there to give a competition that Nxweb deserves.

Ace17 · on Feb 27, 2016

Again?

bigdubs · on Feb 27, 2016

Seems cool, but curious if teams have investigated golang for these use cases, specifically if throughput is sufficiently high and the GC pauses are sufficiently small.