Putting aside all of the technical fun, from a pure reader perspective of any website sometimes I miss just simple boring HTML with a bit of CSS to make it more pleasant and readable. I don't know about anyone else, but particularly from a consumption standpoint I miss the days of 100k webpages.
I'm always stunned, but at the same time never surprised, when you discover a single webpage is 35+ MB, consuming 2GB of RAM, and consuming CPU as if it were a midrange video game.
Whether JavaScript is a good fit for a news site and whether news sites should be single page apps are two separate questions. It's possible to build a server rendered React app without requiring any client JavaScript execution at all.
News sites are also more complex than you would assume. I went from building products at Facebook to a large news site. What shocked me was the surface area of the user facing products. There were so many little one-off pieces of functionality and randomly integrated services. Tools like React and Relay make it much easier to manage this complexity and promote code reuse.
You can make use of the JavaScript ecosystem. React is a nice was to structure code. There are a tonne of modules
available on npm, etc. If and when you do want to build client elements you can reuse parts of your code.
That said, I'm not aware of anyone doing this at scale. I know the BBC were considering it. My company will be using it to product AMP pages from our JS stack.
Do view source and vast majority if byte is used for actual text content, rather than javascript and markups.
Mobile friendly too, but you wouldn't know it because they disabled responsive view unless it's visited from an actual phone. No premature mobile view with hamburger kicking in when viewing it on desktop.
Comparison of View source between two sites is quite amusing.
> but you wouldn't know it because they disabled responsive view unless it's visited from an actual phone. No premature mobile view with hamburger kicking in when viewing it on desktop
Bug, not feature. If I'm viewing a site in a narrow window, I expect it to collapse responsively. That site doesn't.
I can assure you it's not a bug. Many Asian sites have fixed width on desktop. They do not make their sites with 4-5 different media query break points.
Personally I prefer this because information is always in the same place regardless of browser size. I don't have to worry about it moving around or hiding.
100k webpages were never a thing for the amount of content that we are able to serve these days. Consider http://www.dailymail.co.uk/ as an example. The HTML document alone is 795K. Website such as this would immensely benefit from progressive loading.
After reading my ceremonial 1000th testimony about how HN user prefers sites with less CSS I'm about done with it though. We get it! Engineers are proud of how little they care for white space and colors! Can we talk about something else in the top comments of any page about website design?
That's not what people here are saying, though. Indeed, even your parent comment said how he misses a small HTML pages with CSS. Most engineers would be totally fine with HTML and a modest payload of nice, modern CSS styling, a perhaps a small bit of non-required progressively enhancing JS.
Our problem is with huge JS frameworks used in sites that aren't actual web applications (e.g., Gmail), but rather web sites (like news sites). And I say this as someone who has primarily made my living the past 5 years as a "frontend engineer" (ie, JS programmer). JS frameworks can be wonderful for actual web applications, but they're way overkill for documents online for reading (and also make the experience worse for the reader).
Oh, and we also hate of dozens/hundreds of kb of unneeded font downloads.
Also, for the record, I'm a big fan of (appropriate use of) white space and color. I definitely come down heavily on the side of bettermotherfuckingwebsite.com (vs motherfuckingwebsite.com).
I wish there were some way to avoid being subject to resume driven development when I'm on the internet. Alas, it's not to be, because instead of taking a step back and asking "do we need this" we get webdevs asking "how can I force fit the latest shiny bauble into my professional CV." And we end up with react graphQL node AWS kubernetes docker rube goldberg machines pumping tens of millions of bytes of data and billions of bytes of markup and JavaScript through Kafka all to serve up news text.
Everybody's job looks easy when you don't have the full list of requirements and a deadline in front of you. What looks like resume-padding may in fact be the best way to fulfill a requirement you didn't know they had.
I've looked at GraphQL a number of times. Does anyone have any practical examples of integrating it with backend(s), APIs, and/or specific databases?
So instead of "we use GraphQL, much love" + basic example and how it looks on React - a "here's how we take that structure and resolve it and return it." Because that structure looks amazingly sweet - but if in the background it's requiring circles of work, work and rework...
We are using the Sangria[0] framework with a Play[1] app. Sangria does all of the GQL related stuff and Play dos the usual server stuff. Sangria's documentation is quite good, but the part that answers your question will be in the "Schema Definition"[2] section, which is where you describe the schema of your graph, and how each field is resolved.
This is definitely a real problem and we just launched a website yesterday which we hope will grow into a resource for this kind of more advanced content: http://www.graphql.com/guides/
You've hit a (pain) point.
While graphql reduces the amount of trips to the backend for the browser, those round trips get pushed down to a lower level, between backend and db.
The reference graphql-js implementation, while at first looks so easy, you just write a resolver for a field, makes it oh so easy to have terrible n+1 problems. dataloader helps a bit but it's not as optimal as it can be.
You need to be very careful. To be optimal, you'll basically need to write very complex resolvers that inspect the AST themselves and fetch the data optimally which at that point is almost as writing your custom execution module.
That is basically what i did, custom execution module to translate a graphql request to a single sql query (https://subzero.cloud/)
Sometimes I worry whether GraphQL is analogous to ORMs: a theoretically elegant attempt to solve an impedance mismatch between a data source and its consumer, but one that ultimately just shifts or redistributes the impedance to different layers of the codebase...
I agree it seems like potentially the same problem, and it manifests in forcing devs to write 'resolvers' which looks like horrible drudge work. So I wonder if there are decent offerings without the impedance mismatch between GraphQL queries and the database (so we don't have to translate them into SQL). I did a very quick look and at least found this: https://dgraph.io/
I'm getting the impression that if nothing else though, I'm probably gonna need to wait a few years for this tech to stabilize more. Which I find very unfortunate, because the standard way of doings things these days feels very unpleasant to me (I hate writing boilerplate more than anything: have an overuse injury, so typing is the worst part of coding)—and the GraphQL queries do seem to make a lot of sense (although forcing clients to have explicit knowledge of schema structure seems a little dangerous...).
Technique inspired by PostgREST and graphql spin "stolen" (in a good way) from subZero :), although a completely different implementation method then subZero
If you have a 3 level query (3 tables), the best you can hope for is to get 3 sequential queries, like get first level, collect the ids, request the second level, collect the ids, get 3rd level. It gets even more complicated teh more levels you and the bigger the dataset returned.
all of this can be done using a single join which is one roundtrip and it's faster.
If you have low latency queries (eg. because most things hit cache rather than db) sequential queries aren't as much of a problem. Being able to join everything into one query, on the other hand, is a luxury which can be hard to maintain at scale.
In the 'join in SQL' case you could identify particular cases which are doing sequential queries and implement a different loader which just does one. It's not automatic but perhaps in most cases it's not necessary to do this step anyway. In the worst case you're back to doing as much work as you would for a bespoke API endpoint, but that's not the typical case. How much of a problem this ends up being in practice very much depends on the type of app you're building and how you intend to scale it.
Saying join is a luxury is a dangerous thing :) (for impressional devs :)) 99% of projects are not "at FB scale" :) so join is exactly the right thing to use, it's been tuned over decades so until your scale/dataset does not outgrow one box (and there are big boxes now), you are not going to do a better job then the query optimiser (cause after all that is what you are trying to do).
Fair enough. What I'm trying to get at is that you need to pay attention to the cost of the real world queries which are actually being executed against your API, and then optimize. I'm not sure that SQL is a magic bullet there either (you can still request too much data at once, for example).
In practice this probably means adding some logging of how long requests take and graphing it (say, 95th percentile request time) from time to time to spot pathological queries. Even better if you can automate it. I think this is stuff that everyone should be doing (after a certain stage), regardless of whether you are using GraphQL or a bespoke JSON API.
why would i "optimise" for the thing that might never happen (join not an option) while taking the hit right now?
i've tested this (3 level query) and the throughput is 10 times slower with dataloader, as in i need 10 servers instead of one to do the same job.
I mean to structure your server in such a way where you're not trying to automatically convert your entire query AST into a single massively complicated SQL query. It's true that in simple cases, one complex query can be the fastest way to get the data you need, but you leave this territory pretty quickly.
It's perfectly valid to identify subtrees of your query that would benefit from being executed as a database query, but to do that to your entire query just sounds like you're asking for trouble, I'd even go so far as to call it a premature optimization.
I kind of understand why you'd think it's a massively complicated query. You are probably thinking the types of joins on two tables on random columns with weird conditions which go into full table scan.
I am talking about queries/joins between tables that have foreign keys between them, like client/project/task/comment. I bet 90% of graphql schemas expose those kinds of relations between types.
For those type of relations (with FK) i can generate a single query that is as fast as it can be (certainly faster then dataloader) and as far as i've tested (a few millions of rows in tables, 3-7 levels in a query) i didn't leave the fast territory :) Of course there might be edge cases ...
About premature optimisations. Everyone likes to quote that, but never the full one which is "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." https://shreevatsa.wordpress.com/2008/05/16/premature-optimi...
The paper where it was published was about using goto to optimize loops and such small things, it was never "targeted" at algorithms and architecture.
As i think i mentioned here above, i was getting 10X throughput with joins compared to dataloader and i would not call that premature.
I use a programming language that is 1-2 orders of magnitude slower than C, should I switch today?
I get my data for a full UI within the budget i've allowed for uncached scenarios (100ms, but I want to go to 50ms). The approach you're suggesting will (in some circumstances) give me some short-terms win in terms of throughput and response times, but you've not said anything to suggest that I won't lose these benefits as I gradually transition into a domain-driven or micro-services (yuck) architecture.
I like to build my GraphQL servers under the assumption of a domain-driven architecture (because that's where all the projects I've worked on seem to end up, your mileage may vary), and then shoe-horn in some short-term performance tricks when I can.
I'm possibly a special snowflake here, but it's been a long time since i've had the opportunity to work on a project where I can go straight to the DB. Be it Elastic Search, a 3rd party, ill-advised micro-services, or complex logic in-between storage and presentation; nothing has quite been a pure DB project in the last 6-7 years.
Of course, you could argue this is premature architecture ;) but many of these complexities are from day one, or at least pretty early in a project's life.
You don't have to worry so much about n+1 (once you're using dataloader) as much as you do deeply nested queries, or queries that run against a large number of datatypes.
i am not sure i follow, the deeper the query or bigger the returned dataset, the worse the performance of a dataloader type solution (see my other comment).
We're agreeing here, I think. I'm saying that after taking care of the n+1 problem by using dataloader, you still have to worry about deeply nested queries.
This is true, but most complex UIs tends to result in queries that spread wide rather than deep. It takes some deliberate contrivance or fairly unusual real-world cases to get to more than 3-4 levels of nested relationships, and this is often the point at which you'd be thinking to deferred/lazy loading in the client anyway.
I've spent a significant amount of time over the last year or so optimizing GraphQL servers built on domain-driven services (i.e. joins aren't an option) and managed to get to equal (or very marginally worse) performance to existing handcrafted endpoints that returned equivalent data (it was possible to build the same UI, even though the payloads weren't identical).
There are areas where GraphQL is inherently inefficient (trying to work on ways to mitigate these issues), but the reality is that deeply-nested UI appears to be less of a problem than I originally thought it would be.
I strongly suggest that you look at Apollo's GraphQL offerings. My 'aha' moment with GraphQL came while reading the docs for their GraphQL server[0].
IMHO Relay doesn't make a lot of sense. Apollo Client [1] has a much better feature set for most use cases, doesn't need React and is better documented.
+1 to that. I'm working on a (mostly) GraphQL application with a Rails backend and a React frontend with a touch of Redux. I really wanted to like Relay but the documentation was lacking (I suspect that a major API change is to blame) and the amount of boilerplate is prohibitive. On the other hand Apollo Client is straightforward, framework agnostic with great React bindings. As a web development veteran I feel that I've never been as productive as with this particular stack.
Used Relay Classic for a year and a half, been using Relay Modern for a month or so. You are absolutely right, the documentation situation is terrible. The fact that the new mutation API has practically zero documentation is troubling.
I've been sticking with it though, and I am enjoying it. I feel like I have a greater grasp of what's going to be executed and when than I ever did with Relay Classic, and the file size + performance improvements are worth the cost of admission in my mind.
Have you been using normal Relay or one of the forks/modified versions that supports server-side rendering? The fact that Apollo Client supports server-side rendering out of the box was a big plus for me.
> the file size + performance improvements are worth the cost of admission in my mind.
Is this Relay Classic vs Relay Modern or Relay vs Apollo?
The libraries that support server-side rendering aren't forks, they sit alongside Relay. But yes i've been using them for well over a year without any issues. I haven't tried it with Relay Modern yet, but there are examples out there of how to do it.
Relay Modern is 20% of the size (or 5 times smaller) than Relay Classic, which (if my calculations are correct, I don't have equivalent environments set up) is just over half the size of React Apollo + Apollo Client.
As a standalone all-by-myself indie developer I was skeptical of GraphQL when it first appeared, but later I discovered it is much easier to develop APIs for my own consumption with GraphQL. The basic GraphQL boilerplate seems bad, but is actually fun to write and makes total sense. And once you have the structure in place it is super easy to add functionality.
Plus bonuses if you have more than one database or are mixing data from your database and external APIs in your backend responses.
Please try it for a small project, it is unbelievable, but you'll probably enjoy it.
(For the record: I've just used https://github.com/graphql-go/graphql and Lokka on the client, because it is simple and does nothing fancy, it's a thin wrapper over XHR, I think.)
Here's a high-level article about performance in general, but it shows how a UI and query could map to (for example) a function-driven API (this could be local to the server or remote):
Examples more specific to a particular backend technology feel redundant because my assumption is that once you're in the land of calling functions, we don't need to hold your hands anymore.
The most important thing is to be aware of the different batching strategies that are available to you in each GraphQL implementation because I believe this is the most critical part of getting a GraphQL server to perform well with anything other than a graph database.
assuming all your data comes form the same database, you can reduce one graphql query to one sql query that you can run on PostgreSQL (which is not a graph database).
So batching is not the only option (and probably not the best)
Ryan supports development with a pro version that has a bunch of neat features, including built-in support for a handful of common authorization frameworks. I highly recommend it.
On the server side it's very similar to REST. You define your types in a schema file, and then write functions that go fetch the data for each type. These functions are called resolvers.
For example you have a Product type, and then write the resolver function that queries the database and/or another API. When you have the data, you pass it back to the client via Apollo or Relay.
The big advantage over REST is that the client can define what data it wants and how it wants it. If you are full stack dev this isn't such a great advantage, but for bigger projects where front/back are spread among many engineers this can be an advantage. Also, since the schema defines the types, your API is almost self documented so to speak.
The big disadvantage is authentication and authorization. We kept using REST for authentication, and we couldn't find any ready made solution for role based authorization like you have in Express, Hapi, etc.
I think a combination of REST and GraphQL is the better approach.
I've written a fair bit about it. We use it at AppNexus internally for our UI's, and it simply maps back to a REST API. Since each API has its quirks and oddities, it's a nice abstraction layer for consistency.
I agree. All of the examples are trivial. There are a lot of nice things about graphql. Yet there are a lot of problems and annoyances other things don’t have such as query batching. Even authorizing certain graphql queries / mutations is not a trivial thing. I’ve been able to solve a couple of these things in my own app, but it’s just not something built into the spec.
We're using graphql in front of a Drupal 8 data engine, with a variety of other data sources behind that. It's also for a publishing company, that has a bunch of different front end sites all served from the same content store... But with separate development groups. We're using the youshido graphql library for it... Though now that development is further along, I wish we'd chosen the webonyx one.
A good GraphQL backend resolves the graph into query results efficiently, rather than just field by field. But that does take a but of getting used to....
Even if we don't use React on the client side, it can act as an excellent server-side view templating language. This is because React inverts the templating model on its head.
A typical template is an HTML file (or some variation of it) within which the dynamic content is inserted using string interpolation.
A React view on the other hand is a piece of Javascript code which can compose small snippets of view as JSX, use programming constructs like loops and conditionals and finally return the assembled result.
Basically, React views are pure functions that return a validated HTML snippet. Normal templates are big blobs of strings with logic mixed in.
The string building approach is the fastest, and it's the benchmark to beat. Any other abstraction is just piling more work on top and is generally just a more inefficient way to output HTML. The most lightweight pseudo-DOM implementations are still going to be significantly slower than string concatenation, and I have benchmarks to back up that claim [0].
Realistically, a server-side rendered JS app is also going to run most of the code that runs in the client per page load, so you would also have to consider initialization costs as well. I've had to work on one that took 100+ ms to render a static page (without accounting for network latency), which a static file server could render the same page orders of magnitude faster.
Long story short, most of the newer, non-string based JS server-side rendering does not consider performance a factor and consequently, perform pretty terribly. There are band-aid fixes such as putting a reverse proxy in front or running on super fast hardware, but it's like putting a band-aid on a bullet wound.
I don't know who downvoted you or why, but yes - performance could be a problem as you pointed out. I prefer the React model simply because it is more programmer friendly. I would hope that performance is something that can be fixed, or at least the 80-20 rule will come to help.
Most JS server-side rendering never approaches anywhere near string concatenation performance, and I've tried. Whatever abstraction you use has to resemble string concatenation without doing much else, and it's really, really hard to not do much else. That's why embedded JS templates (EJS) is so fast, it just concatenates strings with some logic built-in to the template.
Yes, they likely have reverse proxies that cache static content. But this won't work well for any sort of dynamic content, i.e. apps that require login, when data changes in real-time, etc.
It pretty much has to be, there is no way NYT is giving up showing up in search results.
> How does a server side React app look like? Is it basically a node application then?
There is almost certainly Node somewhere in the pipeline. It's possible to build static content with Node/React as well, but NYT has dynamic server functionality as well so it would not surprise me if there's at least a Node layer in production.
React is a view library and it does this very well. It doesn't have all the bells and whistles like Angular and it's plethora of helpers ($HTTP etc) or complex dependency injection.
I would argue it's the perfect candidate for this purpose. What does a news site need? Articles? Ads? Basic nav? Angular or Ember would be overkill. And with Redux, it's very easy to think about complex UI state.
My only complaint with React is how large it is given what it actually does. Loading 100kb of JS (not Gzipped) seems very heavy.
Why not? Newspapers are fairly complex websites too and React is amazing at reducing code complexity, especially when compared to using vanilla js. There are other libraries that would also work great, but then it's comparing pros and cons and there is little reason React can't win in such a comparison.
My expectation is that the site does the templating on the client then. And my experience with websites that do that is that they load slow, behave sludgy and suffer from all kinds of display errors.
This might be a worthwile tradeoff to quickly build a highly interactive realtime interface. But for a newspaper? As a user I would be very much turned off to endure all that just to read an article.
One of the really powerful features of React is that server rendering of the views is trivial. This makes it very compelling even for content sites, where as you rightly say, time to first meaningful render of said content really matters after you hit the URL in your browser.
You could even write an entirely server-rendered web application, or a static website, in React should you be so inclined. In fact, I do the latter for my (very simple) personal site at https://davnicwil.com!
Nothing about React obliges you to do client rendering, a SPA app, or anything complex at all. It's, at the end of the day, just a view rendering library.
Endure it? Try visiting the NY Times. It doesn't load slow, behave sludgy, or suffer from display errors. And SPAs are a faster experience when you know the user is going to be looking at multiple pages, like how people typically read newspapers.
If the user is on any hardware from the past 7 years, a React developer would have to do some distinctly bad programming to make it behave sludgy. Any poor performance is likely to be from the same things that make a classic static page slow: large media files and tracking scripts.
You mean www.nytimes.com ? Is that already the new react powered version?
As a sidenode, it does feel sludgy:
- Scroll is not smooth.
- It does not adapt well to different browser sizes
- After a few seconds the page "jumps down"
But the reason might simply be the overkill of JS,animations, overlapping elements (like the static header), ads, dynamically loaded stuff and other crap. When I turn off JS, some of the problems go away.
I didn't look close enough in the developer tools to see the limited extent. From what I can see, just the email list sign-up forms on www.nytimes.com are using React, but all of the new front page of mobile.nytimes.com is in React. They are testing the new version of the mobile site in a partial roll-out, so if you open a new browser session there's a good chance the new one will get served to you (It will have a message saying "You’re seeing a test of a new version of the New York Times home page.").
> But does it make sense for displaying articles, navigation and ads?
No it doesn't. We (as a marketing tech company) deal with major publishers all the time and come across all kinds of ridiculous and complicated tech used to show a basic article when a simple static website would be easier and faster at this point.
It's a lack of good talent, resume-led motivations for choosing technology and overall poor vision in execution by CTOs and management.
React (or similar library) makes sense whenever you want to have the ability to reason easily about state in your UI application. So if your UI accumulates any kind of user-introduced state, like text input, some kind of toggle, React provides a very pleasant mental model.
Initial page load might be marginally slower, although it's not hard to thrown down the bare minimum and lazy load the rest, but every subsequent page load will be faster and much less intrusive.
I'd say react is a perfect fit for this type of thing. A basic news site it might be, but there's a lot more going on under the hood than you'd imagine.
I tried Relay but found it ridiculously stupid that I need to change the graphql API on my server in order to satisfy the relay concepts(universal id, connections) so we rejected the idea and decided to go with the good old fashioned redux
I appreciate your perspective, as I've been considering whether to invest time into learning GraphQL. It's informative to know that it's not suitable for some use cases. Probably could have done without the inflammatory adjectives.. ;)
As far as the criticism, would you be kind enough to explain what you mean by "relay concepts", and how GraphQL failed to satisfy those needs?
I think the parent is saying something different: They were happy with GraphQL for their API, but weren't happy with the requirements the Relay client library imposed.
My interpretation is that they are now using a GraphQL API but fetching from it with regular HTTP fetches and managing data with Redux on the frontend.
Well, I just imagined that the devs of the library may be come across your comment and be offended/saddened to hear it being called "ridiculous" and "stupid". But then again, it sounds like you have good reasons to be frustrated with it, so why not. I recently saw someone write, "Webpack is ridiculous" - I suppose there's nothing wrong with inflammatory adjectives if it's backed up by evidence/experience.
That's cool to have here a synthetic explanation on how Relay can be more familiar.
I think I would just still be more attracted by Appollo framework which is more a redux-like syntax, so more consistent with all the workflow of the apps I used to develop. But maybe Relay has better benchmarks?
Also, if I want to stay REST but with optimist transactions between the backend and frontend, I prefer lighter lib like
https://github.com/tonyhb/tectonic
or even I just write some fast redux-saga watchers that helps to make my frontend always synchronized when my app calls a mutating db request.
This is very similar to the stack we use at BDG Media (bustle.com, romper.com, etc) We use Amazon Lambda, GraphQL, a custom model layer with data loader and redis, and preact. We haven't seen a clear benefit from Relay or Apollo with out front end apps (tbd on our admin apps) but we have enjoyed the Relay spec to help set server side conventions.
GraphQL has helped us make an api that is easy to understand, easy to change and and easy to use. We love it.
We still use Ember and fastboot on a few applications but it's being replaced for the user side of bustle. I believe we did it because we were able to get faster and smaller server side renders and a smaller js payload. I spend more on the infrastructure side so I can't speak to the exact reasonings.
A website that should heavily depend on caching relies on tech that allows only POST requests (non-cacheable, non-idempotent) and hopes for workarounds later
The caching section in GraphQL docs is cringe worthy, and, as evidenced by the article, that's exactly what Times are going to do: try to slap global ids everywhere and pretend it's ok
I'm always stunned, but at the same time never surprised, when you discover a single webpage is 35+ MB, consuming 2GB of RAM, and consuming CPU as if it were a midrange video game.