I wrote the blog entry for this round [1], and I would agree with ksec (elsewhere in these comments) in recommending that readers start there since it includes the highlights of this round. I apologize ahead of time for my humor.
It's still a bit early out here in California, but if anyone has any questions, I will try to answer as I can!
Do you have any plans to add memory usage to the benchmarks? It adds an extra dimension and sometimes even shows if there's a problem with a specific implementation (e.g. in the benchmark game it's often a hint for a program that can be optimized if the ratio compared to other languages is off).
Yes! In fact, I just responded to a similar question over at the Rust Subreddit thread about the same topic.
We do in fact capture dstat data while executing the tests but as of today do not render these in any way. You can find raw CSV output from dstat at our logs server. For example at [1] are stats for Grizzly while measuring the "json" test type.
I will create an issue at the project's GitHub repo to begin a conversation about which stats to render. I'd like to select a meaningful bite-size set of stats that we can render into a table view. Basically another tab in the results view similar to the latency tab.
If I could make a recommendation that would humor me personally, you could publicize the stats required to run the instance on an B1S or Fv2 Azure instance.
I think it would be really interesting to see which frameworks are so costly upfront that they would impede a developer from prototyping with them on the cheap.
I had not checked these for a good long while. (Probably around round 10 or 11.) Looking back in on it now, holy MAN!
C (C++) and Java. Wow.
Even though I'm old, I'd sort of forgotten how speedy they can be. It's strange, because I actually work with Java and C in making streaming game engines. And you kind of get used to thinking in terms of millions on a set of hardware. So you think Java and C are slow. Then you realize if you were using Python or Node or anything other than C or Java really, you'd likely still be working out how to support tens of thousands on that same set of hardware. (If that.)
We can get so used to what we have. Maybe that's just a human thing.
Rust, C#, and Go also seem to fit the bill, according to these benchmarks. C# particularly has rocketed up since the last time I looked in depth at these. Pretty impressive work from the .NET Core folks, I'd say.
For Rust, I'm interested to see what the future holds once the community coalesces around an async IO interface. Additionally, when procedural macros stabilizes (for real), the possibilities for ergonomic frameworks in Rust become endless.
Rocket has been really interesting to work with, but it only works on nightly since it relies on experimental codegen/procedural macro features. Once procedural macros stabilizes, all the codegen can be ported to that. It doesn't support async IO yet, but that's because it's waiting for the community to coalesce before diving into it.
I think there's a future where Rust web frameworks are as expressive as Rails or Laravel but bring type safety and sound, zero cost abstractions to the table.
Lots of folks seem to be getting drawn to actix-web, which features quite a bit up the top of these benchmarks. I've not used it, but it works on stable, is fully async, and has been met with many good reviews!
> I think there's a future where Rust web frameworks are as expressive as Rails or Laravel but bring type safety and sound, zero cost abstractions to the table.
I'm really looking forward to this. The only thing I worry about is compile times. I hope they don't end up ruining an otherwise perfect solution!
Speaking of which, a few hours ago Futures were added to the standard library, so the next nightly will have them in. We're quite close to that standardization!
Just adding for clarity: this isn't async io, but the foundation trait for building promise/future combinators in Rust. Tokio, an async io library, is built around this trait.
Plus one for c#, with compiler optimisations I was able to generate millions of sha256s a second, many more than my c++ version of the same code. Then I discovered CUDA, then I got bored and went onto something else.
I'm not sure how many companies use C, C++ for Web development. In my opinion C/C++ wont be good idea to use for most. It is about choosing the best ration b/w tool speed and developer speed(productivity).
There are very few companies which requires million per second request.
For me Java, C#, Go, Node provide the best compromise b/w speed and productivity.
Vertx is polyglot, you can use js and npm if you like, so in this respect vertx can be more productive as it allows you too use the tool that works best for your problem, be it npm, maven, gradle or language: java, javascript, kotlin, etc...
Honest question: What sort of embedded hardware deals with millions of requests a second? I wouldn't have thought that kind of utility was the same domain as "embedded" (in the stricter sense of the term).
Granted you could expand the term to cover hardware load balancers and the sort, but those wouldn't really be running websites so much as routing and caching them. Aside from that, you have management interfaces for embedded devices but they're typically used infrequently and only by a small subset of users. Plus a surprising number of them are just written in PHP or Perl.
But maybe there is a whole industry out there I've overlooked? You've got me curious.
It is not millions of requests a second, rather what you can put into something with 512 KB or less while providing a minimal Web management interface.
Ahh yes. I do have some familiarity with HCC - albeit limited to a hobby project rather than anything professional. It's interesting stuff but I'd argue that's a completely different target industry to the ones that would use any of the web frameworks in the discussed referenced benchmark. So not really on point to the question raised by the GP with regards to C vs the more rapid development frameworks like node.
Not that I'm trying to dismiss your point either because if I was given a choice between node and C, I might personally choose C just because of my own personal prejudices against Javascript. :D
I don't have actual experience with HCC, it was just one random example.
Embedded is one of my interests, so I do keep an eye on news about what is going on.
On my case I would rather try to bend my answer to be either C++ with embedded STL or Pascal, failing that C as well, in spite of my prejudices against it.
I love Pascal (C++ less so but we all have our own personal preferences) however if you're targeting the web then you're definitely better off with C#, Java or Go* . You'd get similar language constructs while having a much richer ecosystem to develop in. Which was the point the GP was making with regards to development time verses request throughput. Only with the aforementioned you at least don't sacrifice that much in performance compared with node.
* I know Go gets a lot of hate because of its lack of generics and choice of error handling, but it's worth remembering that for web development you don't run into those particularly annoyances much.
Something that I learned after being part of a startup back when doing web servers in Tcl was cool (AOLServer and such), and Zope was hip, was that I never wanted to work again in software stacks that lacked a JIT or AOT compiler for production code.
I find it very frustrating that Scala, a language that is entirely capable of running just as fast as Java, is full of frameworks that are relatively slow compared to the java frameworks. It has to be some sort of fixation on reinventing everything from scratch in order to fit into some idealized idiomatic scala way (ignoring the fact that there are a million different versions of "idiomatic" in the scala community). Even the Play Framework, which used to be based on Netty and was pretty damn fast, ended up spending entire release cycles focused on migrating to an akka-based backend and they ended up with shittier performance and now they have to maintain two backends because nobody wants anything to do with their new akka backend.
Sometimes mutability and imperative code are okay, even in scala.
There's not that much difference? Or did the Netty version, get slower, because of refactorings done to support Akka?
Anyway I too feel frustrated that the Scala / Play Framework developers don't seem to focus so much on performance, but rather (it seems to me) on writing theoretically beautiful Scala code? I'd rather have something fast & simple like Vert.x, in Scala
Well, there's Vert.x + Scala, "only" 70 stars: https://github.com/vert-x3/vertx-lang-scala thugh. I wonder how Vert.x + Scala compares with Vert.x + Java, would have been really interesting to include in the benchmarks :- )
These are fun benchmarks, but every single framework in the top 250 or so is more than fast enough for most apps. Even the slowest in the list manages 366/requests per second, which is probably enough to prove an idea before optimizing for speed.
We're approaching the point where speed is essentially a solved problem unless you're at Google-scale.
> I argue that if you raise the framework's performance ceiling, application developers get the headroom—which is a type of luxury—to develop their application more freely (rapidly, brute-force, carefully, carelessly, or somewhere in between). In large part, they can defer the mental burden of worrying about performance, and in some cases can defer that concern forever. Developers on slower platforms often have so thoroughly internalized the limitations of their platform that they don't even recognize the resulting pathologies: Slow platforms yield premature architectural complexity as the weapons of “high-scale” such as message queues, caches, job queues, worker clusters, and beyond are introduced at load levels that simply should not warrant the complexity.
I agree with the overall sentiment. Plus almost all cases I run into performance issues with e.g. Python in general web development, that main problems were how the code was written, not what it was written in and being bottle-necked by the database.
That said I disagree a bit with saying it is a a solved problem unless you are Google-scale. In developing countries if you are in a small company without big investors, you need to do quite a bit more juggling with infrastructure budgets once you are past MVP compared to for example where I'm from Switzerland.
In addition to some other sites I've seen mentioned in respnose Reddit seems pretty major, and AFAIK it's still largely in Python. (They wrote their own Python framework, although they've been moving to Pyramid.) Yelp is also a Python shop, as is Disqus (which is built on Django).
In the dim and distant days of 2010ish, Python was quite widely used for back-end development, typically using Django or Flask. Instagram, Disqus and Pinterest were all originally built on Django, IIRC.
I think it is not as popular as it used to be for web development, but when I started using it around 2005 it was one of the few choices that were somewhere in between writing php/perl scripts and the very heavy weight Java, ASP,and ColdFusion solutions.
That's in my opinion the main reason why Python and Ruby on Rails got massively popular for web dev in the 2010s. Since then new and existing languages have started breaching into that spot.
What still makes Python stand out is its flexibility and the sheer amount of resources available(libraries, frameworks, books, etc.) across numerous domains. It is a good enough choice for a lot of things, although its not necessarily the best choice for anything.
Until your app gets on HN/Slashdot/Reddit etc and experiences the hug of death.
Jokes aside, I do agree that from a developer's standpoint, the optimal way to act (optimising for business success) is as you say, ignore performance at first. But from an engineering/craftsmanship standpoint, it should really bother you to have an inefficient system.
Just think of all that extra CPU load, and all the CO2, all the coal burnt to sustain it ...
Is it? Concurrent connection handling at scale is a major problem, and plenty of Fortune 1000 companies would jump at the chance to have even 5% fewer instances deployed.
366 rps in a test scenario doesn't scratch the surface of what I need, and I'm not doing anything I'd consider crazy scale. When you need to support hundreds of thousands of rps, small improvements can be very noticeable.
You have no idea what you're talking about. Not only did they produce TWO different implementations of PHP (HHVM and Hack), a vast majority of their infrastructure is NOT in PHP (which is mostly reserved for the front-end/web tier).
I'm not familiar with this site and benchmarks, but from reading comments it seems to be respected.
Given that, I'm confused why this is called "web framework" benchmarks. It looks to me like it is comparing some actual frameworks (which rank very poorly) against minimalist, task-focused http servers (which rank highly).
We use the word "framework" as a term of convenience covering the full spectrum from platforms, micro-frameworks, to full-stack frameworks. We are also liberal with accepting contributions from the community, which means we do indeed include several frameworks that are well outside the mainstream.
That said, in tests such as Fortunes, you will see many full-stack frameworks demonstrating their capability to deliver myriad web application fundamentals (such as request routing, database connection pooling, ORM, XSS countermeasures, character encoding, data structures, and server-side templates) at very high performance levels.
This project establishes a high-water mark of performance. Very few web applications process anything remotely close to the requests per second we're measuring. But using Fortunes as a proxy and applying a coefficient (e.g., 0.005) to adjust for real-world application sizing can give you a very rough but nevertheless potentially useful approximation of real-world expectations. For example, one can guess their application is 200 times more complex than our Fortunes test. Rough back-of-the-envelope math will then estimate a framework from the ultra-high performance tier (~200,000 fortunes/sec) might yield a generous ~1,000 real-world app requests per second on a modern Xeon server. Meanwhile, a framework in the middle tier (~10,000 fortunes/sec) would yield a more constraining ~50 real-world app requests per second. Again, this math is hand-waving and you can interpret the results in whatever way you prefer; you can dismiss the validity of approximating things so coarsely. But my experience is that real-world applications based on frameworks I've used—which are sprinkled among all tiers of our results—do align with this hand-waving approximation.
Edit: I invite you to read the last section of the blog entry about this round [2] where I argue the same point in another way and share one of the tweets from an application developer whose real world application benefited from the performance improvements made to his favorite framework.
If there were a (asterisk) in the title: Web Framework Benchmarks (asterisk link), with the asterisk taking you to a page with something like your explanation above, that would be lovely.
I think that it allows you to evaluate the tradeoff between DIY services on raw http servers and using a framework that reduces development time but is much slower in production.
Yes, exactly. I'd love to see some examples of a CRUD based API built on top of h2o. At least this appears to be possible with one of the higher performers (cppcms).
Interesting I just submitted h20 2.3 Beta, but it seems to be not getting much interest on HN. It could have been called 3.0 with so many new features.
One thing I've noticed is that all the fastest implementations now run PostgreSQL. For the longest time MySQL was thought to be faster, so I guess PostgreSQL really caught up recently. I'm seeing it be the default database in a lot of new open source projects.
This is probably due to the fact that PostgreSQL (libpq) has an "async" interfere built in and MySQL/MariaDB don't. These benchmarks are running very basic queries that perform well on either MySQL or PostgreSQL.
The Blog post [1] which explains things in details.
Basically they discovered Docker! Given the amount of hype it had for the best 2-3 years I am surprise, Now it is all Kubernetes. ( Or was that a joke I am not getting )
They had new hardware, and I am surprised it was sponsored by Microsoft. They are also using Azure for Cloud. And the hardware is recent and much better represent the common usage. Before it was a Quad Socket CPU platform that I doubt many are using.
I know a lot of these does not represent real world usage. But even if we pick, Full Stack Framework, ORM usage, Realistic implementation, ignoring the erroneous results at the bottom, we still have a gap of 50x difference in Fortune, which is the only results I look at.
I didn't check out why Ruby + Racks were not working. Hanami wasn't working either. So the best results for Ruby were roda & sequel, both framework by Jeremy Evans.
Yes, that was a bit of a joke in the blog entry. We suspected for years Docker would be a good fit for this project. Only within the past months did we find the time necessary to convert the hundreds of test permutations to Docker. It wasn't a quick thing, but we did get into a rhythm. The "joke," for whatever it's worth, is simply that we didn't sufficiently appreciate how useful it would be for this project and how insignificant the overhead would be. Perhaps we would have prioritized it higher in the past had we known. But it's done now!
In brief, the Docker conversion effort has increased the stability and reliability of the results.
There are a huge number of ways to consume the results and as many opinions. We welcome the diverse points of view and hope you find the data useful!
I love these but I've also seen in some of the implementations that I care about a bit of benchmark gaming. That saddens me and I wonder just how much of it is happening in languages and frameworks that I don't follow.
It's interesting to look at the source code. There's definitely some heavy differences in how each implementation is handling underlying aspects of the tests.
Just looking at the database related ones to see about some of the differences in interesting.
It also doesn't look like the Elixir code has really been updated in about 2 years (aside from version bumps). It's still using a JSON encoder (Poison) that's 4x slower than the primary one (Jason). For multiple queries, Ecto hands back the connection after every query for concurrency sake. Looks like the Plug logger is still setup on the endpoint.
I'd be really interested to see the Discord folks look at that as a pet project. :-)
That's the beauty of these "pinewood derbies" though. Sometimes they sit dormant until some kind soul says, "HEY!, that's not representative of my favorite framework!" and submits a PR to showcase its true capabilities.
I've been following these benchmarks for some time, and am always shocked that Spring does so poorly (it's 7% here). I haven't had any performance issues with Spring in production, so these benchmarks are puzzling. Are the other frameworks really that much faster in practice?
What kind of traffic do you see in production? Rails gets hammered in these benchmarks, but does just fine for most companies' requirements. It's pretty rare that you'd actually need to eke out the raw numbers that the top contenders get. I'm of the opinion that-- within reason-- developer ergonomics, and ability to quickly solve business problems are more important than raw performance, so long as performance is good enough by some agreed upon metric.
Good point. Thanks for the feedback. Our sites don't get a huge amount of traffic, so it's possible Spring doesn't have as good a concurrency story (or it's due to memory usage) as the higher-ranked frameworks, so it's been sufficient for our needs.
Yep I feel the same about Elixir and Phoenix. It's way down but again, the "way down" is 20k requests per second. Not too shabby. The biggest project I've worked on had 50 requests per second, and that was an E-Commerce site that brought in 9 figures a year in revenue.
I'm sure IoT workflows are mainly where you start to see these more insane RPS numbers. But still, this just tells me there's so much great choice out there for pretty much any platform you want to stick with.
I'd love to see another metric that normalizes between Cloud and Physical, like $ per request. We don't know how much the cloud server cost vs the physical, do we? I mean it's a 10x difference in performance, is it a 10x difference in price?
Update: Azure D2v3 instances (used for the cloud benchmark) are about $55 a month. ($660 a year).
Just the Xeon in the physical costs $1,500. The full server probably costs $5k? So assuming a server has a life of 3 years, you're looking at the Azure instance being $1,980, the Dell being maybe $5k? So it's about 10x performance for 3x the price.
Someone please fix my math and fill in the gaps :)
I wonder how a Spring WebFlux [1] variant of the spring benchmark would perform in comparison. Also the spring benchmark has been updated to use a recent spring boot release about a month ago [2]. Before that it was using spring boot 1.3.5.
I think it would have been even faster if they'd test with SQL Server because you can do things with SQL Server that you simply cannot do in PG such as stored procedures that return multiple resultsets, which saves a ton of round-trips.
You can do even more advanced stuff in Postgres, but they've changed the rules to dissalow it since it much more faster than the the current best ones. Something about being fair to other frameworks...
> Can you request multiple heterogeneous result-sets from PG now, with something that looks like a stored procedure?
Not prettily. You can return cursors, you can use json etc, ...
But you can pipeline SQL statements. I.e. just send N SQL statements (including bind parameters etc, the protocol is the same) without waiting for results, and then process the results as they come in. If you want to avoid latency penalties that makes much more sense in my opinion than having to wrap multiple statements in a function.
It is interesting to watch the latency numbers from the table(in a separate tab). I suppose latency is more important to most users than the peak throughput, considering that production applications almost never hit peak throughput and are often supplied with extra hardware before utilisation gets there.
theres a lot of weird things that dont make sense. like how a minimalistic framework sinatra/padrino is slower than rails doesnt seem to make sense at all.
i feel these benchmarks are misleading to say the least.
Check out vertx on the multiple query benchmark - it is almost twice as fast as the next entry. Just got confirmation from TechEmpower that they are using a new pipelining feature in postgres to wipe the floor with the rest of the field.
Is 'Fortunes' a test involving the quote generating program? Also, is the difference between 'Single Query' and 'Multiple Queries' concurrency? I always look at these but I have trouble understanding the test cases.
The best resource to answer your questions is the Requirements for the various test types [1]. But in brief:
* Yes, Fortunes is named after the Unix tool of the same name. In our case, it's a test that executes a query of all rows in a table, adds an additional item, sorts the values in the application code, escapes the values as a XSS countermeasure, and then renders them using a server-side templating library.
* Yes, the single-query test is always executing a single-query per HTTP request and is measured at various concurrency levels. The multi-query test is measured with consistent concurrency and varies the number of queries executed per HTTP request.
I think TechEmpower are doing a great job on these generally, but the plaintext benchmark is problematic because they allow pipelining. (Which isn’t enabled in browsers, isn’t typically used in request libraries and tools, and has generally been sidelined) Frameworks which are tailored with that in place can get a big boost in the ratings there.
If you want a measure of plain request/response I’d use the JSON benchmarks as the baseline.
It is really surprising that the same frameworks are much slower running pypy than cpython. I'm assuming all the result are run from cold... Considering that webserver rund for long period of time, it'd be interesting to measure after several runs, and see how the JITted platforms (pypy, java etc...) place in comparison.
This is the first time the PHP extension swoole (an async library) was on the test. It took #10 place on the JSON serialization test. The remainder of the tests rank terribly because the implementation incorrectly uses synchronous DB calls, but once that's fixed it should be very promising.
Are they all running with SSL? Do they keep track of session cookies? Do they have Content Security Policy headers? etc.. There's probably a million different feature configuration differences between each framework.
Quite different. We have included test implementations for frameworks that span 26 computer languages. There are an infinite variety of opinions about how to do things in computer programming in general, and our project is no exception.
And on the other hand, fairly similar. Tests should stick to the requirements [1], which are designed to be permissive but sufficiently clear on the expected work load of each test type. The principal goal is that test implementations should be realistic, and we will mark those the community believes are not realistic as "stripped" implementations.
> Are they all running with SSL?
No, not yet. But planned future test types would include SSL/TLS.
> Do they keep track of session cookies?
Generally no. These are intended to exercise anonymous requests. But a future test type could include session management.
> Do they have Content Security Policy headers?
No, that is not a requirement of our tests. We have specified which headers we expect. Others are optional.
> There's probably a million different feature configuration differences between each framework.
Yes. And it can be challenging at times to get all of these opinions to fit into the same box. For example, we've to-date kept SQLite implementations out of the project since those would not incur network costs. And other times the box has to be reshaped a bit to make room for new consensus opinions about what is suitable for "production." We recently made a decision to allow for innovative features in the Postgres protocol that are—in a manner of speaking—analogous to automatic pipelining. That conversation is still ongoing on our discussion forum.
Yes! We accept PRs from the community at the GitHub repository [1]. We do ask that you submit what you believe is production quality code that would be suitable to run a real web application. We are liberal with how that is interpreted, but we reserve the right to reject code that is too experimental.
Trust me. You won't hit 424,712 requests per seconds per each node (that is 1,528,963,200 per hour).
You need to have many nodes and a a well distributed and scalable infrastructure to handle 1B req/hr.
There is no point to benchmark your programming-language/framework when doing web development.
When it comes to web, you will hit N different capacity limits before that even with slowest framework, starting from your database, network connections, bandwidth, third-party API rate limits, IO performance etc...
Consider only a few options base on project requirements, ecosystem, productivity, and available human resources. then compare the top two on performance.
You can't compare H2O and Rails on the same list. they are two different animals.
Moreover, I'm sure all of these frameworks are not well optimized for a raw performance benchmark, and also these results are heavily affected by filters you are choosing.
Don't spend time on pointless benchmarks.
Yep. It took me about six months to write a replacement of ActiveRecord with an identical API (for what was needed) that could scale to arbitrarily large number of writes.
Shit, pre-A16Z raise 500px ran on Rails and they were a social network around photography. You think Twitter is bad? Try timelines where you have half a million photographers liking hundreds of photos an hour and every like gets pushed into every feed of the people following them. Guys syncing their entire photo library. It was 5 application servers, 4 MongoDB servers for the timeline with some crazy data structures, one or two MySQL DBs.
I think most web projects these days should be in either Pheonix, Rails, or similar. If you need something really fast here or there just fork off the request in Nginx or compile something in Rust or C and extend it into Ruby (or whatever). Or have a compiled worker that communicates through the DB or Redis. There is this long tail of UI you need to make for every web project and nobody uses it 99.999% of the time. It should be in whatever secure language / framework that brings it to market fastest.
>Yep. It took me about six months to write a replacement of ActiveRecord with an identical API (for what was needed) that could scale to arbitrarily large number of writes
Sorry to ask, would that be open sourced?
>It was 5 application servers, 4 MongoDB servers for the timeline with some crazy data structures, one or two MySQL DBs.
That is a little hard to warp my head around, because pre-A16z 500px was a long time ago? Hardware should be much slower and Ruby was way slower as well. We have 2-3x faster hardware and faster ruby today, and the same hardware surely won't do much with Discourse and Gitlab scale.
No. I would have if I owned the code personally, but it wasn't in the companies interests to open source it.
> That is a little hard to warp my head around...
I was there between was 2012—2013. The hardware wasn't bad, I don't remember the exact specs but it was colocated at a place in downtown TO on some decent but not crazy gear. It wasn't perfect, but it worked pretty well.
For the actual image processing we just shelled out to imagemagick and I vaguely recall some fancy batch write code for keeping things like photo view counts from completely destroying the DB. We had a hardware load balancer that worked pretty good and we compiled ruby ourselves for a performance boost, though that ended up killing us this one time when there was a bug that only showed up in the compiled version of Ruby when all the dev boxes ran the universal binary.
People care so much about performance, and in some cases like computer games it's totally warranted, but if at the end of the day you have a DB the DB is going to be the problem long, long, long before the application code.
It didn't solve the ActiveRecord is slow problem, it was marginally faster, but not a whole lot faster. What it solved was handling models over an arbitrary number of databases. Imagine billions of writes of a single model type across multiple databases. Imagine needing to join across databases and not knowing which DB a joined model would be in.
This should clearly show this as time per request rather than requests per second, since these are really trivial requests and aren't going to be representative of actual performance differences in the real world.
Im with you, although we will see if animations can hide loading times.
Btw, you will always have people that disagree with you. I get crap about the loading time of my website, but since I changed to a massive beautiful change, my conversion doubled overnight.
I think a human can decide for themselves if something takes too long. I ran into this issue and optimized, but I wouldnt be chasing benchmarks.
It's still a bit early out here in California, but if anyone has any questions, I will try to answer as I can!
[1] https://www.techempower.com/blog/2018/06/06/framework-benchm...