1. Why not a SUB HTTP request? And a PUB http request. The response URL could be a required header.
2. We have HEAD, can we do service discovery using HEAD?
3. Why not let a topic be a HTTP URL? “PUB /user/john/position HTTP/1.1\r\ndata...”.
4. Subscription expiration as a way to force subscribers to renew and upon renew get redirected to other servers is pretty cool. NATS has a special message (the INFO message) to do the same, but you might be in the middle of an important request-reply session you don’t want to abort.
5. The authors could have made this protocol very “non-http-ish” by implementing what amounts to Redis but in HTTP. I’m glad they didn’t. This still feels like HTTP, which is great.
1. Definitions
Topic. An HTTP (or HTTPS) resource URL.
However, you subscribe to a topic by interacting with a different "hub" URL, passing the "topic" URL as a parameter (`hub.topic`).
Service discovery does appear to support HEAD requests. (See section 4.)
Having a new HTTP verb for subscribing and publishing would seem like unnecessary complexity to me. Rather than ask "why not a new verb", I think a case would need to be made that a new verb is required, that the operation does not cleanly fit into the semantics of existing verbs. The existing verbs are capable of modeling quite a lot.
With the protocol as they've described it, subscribing is just sending an HTTP POST to the hub URL, passing in the topic URL. That's a simple HTTP operation that a lot of clients and programs can be instructed to do easily. Requiring the use of a new HTTP verb will make interoperability difficult without apparent benefit.
> Having a new HTTP verb for subscribing and publishing would seem like unnecessary complexity to me.
Complexity for who?
Introducing a secondary "hub" resource here is just accidental complexity. If I want to subscribe to resource A why am I talking to a different resource B? And once you introduce a secondary resource now you need yet another service discovery mechanism to support discovery of these pseudo-resource hubs. (Heaven forbid using an existing service discovery mechanism like RDDL.)
Honestly stuff like this is just so poorly thought out it's difficult to understand why W3C stamps approval on this crap. There's no consideration given to alternate protocols like WebSockets or XMPP and there's no attempt to layer on top of existing standards in a meaningful way (hub.secret -- really???). Worst of all there's no real understanding here of what it means for a resource to change. The entire Content-Distribution model is geared towards just one very narrow use case.
It's clear the W3C is all about being "inclusive" and "moving fast" there's real fear of "overthinking" things -- but seriously if this is the result we'd probably better off with better standards once a decade then this.
>Introducing a secondary "hub" resource here is just accidental complexity. If I want to subscribe to resource A why am I talking to a different resource B?
Think about what the hub has to do. It may have to notify millions of subscribers, deal with any errors, retry, etc. This is a very heavy duty messaging system that most publishers will not want to run themselves. And yet you want the publisher's domain name to be the well known resource that ultimately controls things.
Publishers may be blogs hosted on small websites or even things like cars, phones, laptops or home appliances that are not always online or have to work under tight resource constraints.
Publishers may wish to distribute their content through more than one hub. We don't even have to think of avoiding censorship to see why this increases availability.
I think making it possible to split the roles of publisher and distributor is a very good idea. You can still decide to implement both roles on one server.
Best explanation for me so far. Problem I see is that the same resources constraints could be on the Subscribers (limited connectivity, limited uptime) and the protocol does not address it?
Good question. I don't know if the proposal addresses it directly, but as subscribers have to provide an HTTP endpoint for notifications, I would expect that subscribers would not normally be end user devices but rather gateways similar to SMTP servers or application APIs.
Funnily enough, the first drafts of this protocol (back then, called PubSubHubbub) were written circa 2008, so this specification is about a decade in the making.
At the time it was distributing content between a number of the bigger blogging/publishing platforms of the day, and also notifying search engines so they could update their indexes more quickly.
If anything it seems like the standardization process was too long and missed the boat here (this particular problem is now most often solved by proprietary protocols), rather than being "rushed through".
Can't deny that the world has changed a lot during the lifespan of this idea, though. Cellular-connected computers in our pockets were barely on the radar when this spec was first written. I'm sure some would argue that the burdens of publishing have now shifted on to the reader (probably battery powered, spotty connectivity) whereas in this spec's original universe the burdens were on the publisher (CDNs not yet as widespread, more independent publishing from web hotels, etc).
New HTTP verbs require support in the web server, and all proxies along the way, while using existing ones doesn't and is just part of the normal application framework.
I've just learned about this protocol tonight, but from skimming the specification, I think it's a well-thought-out, practical, minimal protocol.
WebSub is a protocol for people who want to implement the publish/subscribe pattern over HTTP callbacks (aka webhooks). Using webhooks means that subscribers don't need to have any kind of ongoing connection or session open to receive publishes. Subscribers are passive web servers and merely wait to receive an HTTP POST. No state, no connection, polling, or anything. The general model of HTTP callbacks is a simple scheme that's easy to implement using any programming language or platform out there, all of which have HTTP clients and servers capable of getting the job done with minimal fuss.
I have actually built custom systems that worked using a very similar pattern as this protocol, where clients of a service pass in URLs where they'd like to be notified when an event occurs. Perhaps this is why I find myself nodding along when I read the protocol spec. There wasn't any standard way to model this, so I just invented something on the fly. You also see this pattern implemented in services like AWS SNS's support for HTTP [1], in Google Cloud PubSub, Twilio, etc. Each of these has an entirely custom protocol for PubSub over HTTP callback, and not something that's standard. They all tackle similar issues like preventing attackers from creating unauthorized subscriptions to URLs, but in different ways.
WebSockets doesn't solve the same problem as WebSub. WebSockets require a continuous connection from a client to a service. An application will need to devise its own logic for resuming a session if the connection breaks.
WebSub requires no active connection nor session. WebSub subscriptions could remain functional for months at a time (really indefinitely), with there being no communications whatsoever between messages. The system that initiates the subscription can be different than the one that receives the publishes, which is valuable because it means that messages don't all have to go to a single place. Publishes are sent to the domain name specified in the subscription URL. The subscribed web server could change regularly and everything will work as long as the DNS name keeps pointing to the right place. You could use multiple web servers to handle the subscription, by putting multiple servers in the DNS record, or you could use a load balancer in the same way as other web requests. This means you can scale easily. These are the kinds of benefits you get from building subscriptions on top of HTTP. All of the standard techniques and standard software "just work".
I'm not an expert on XMPP, but I suspect XMPP would also be a bad fit for this use-case, and would also require continuous connectivity from the subscriber (please correct me if I'm wrong.) I think the same is true for MQTT but I'm not an expert on that either.
As a person who has built and used multiple systems following this general abstract pattern, I think this is a good attempt at drafting a standard protocol. My impression reading the spec is that its designers had a good idea what problem they wanted to solve, and what kind of characteristics they wanted the solution to have, and came up with a protocol that succeeded in meeting those requirements.
What's the objection to hub.secret? That facility doesn't seem essential to a minimal version of a protocol like this, but I understand why they included it. It provides a simple way for the subscriber to authenticate that the content they're receiving is legitimately the result of their subscription to the topic, and not e.g. an attacker's subscription, or an attacker system that's trying to impersonate the hub. How would you tackle this issue in a simpler way? (It would not be easy to solve this problem within the protocol using TLS, for example.)
I still don't see the plus value compared to a simple pull-news-from-URL model. RSS with GET is already session-less, and remains functional for months, AND also works when the clients cannot receive incoming connection (mobile devices)
And Pubsubhubbub/WebSub is a plain simple upgrade on top of that model, for when you do not want to rely on polling only, e.g. because you want updates delivered quickly (vs polling, where most implementations sensibly have logic to adapt their polling speed to posting rate of a resource).
One of the benefits closed platforms have is that they can deliver posts inside the platform immediately, WebSub brings the option for that to feeds on the open web, without requiring subscribers to poll every <5 minutes and without requiring them to do large changes under the hood, e.g. introducing new non-HTTP protocols which can't be used on all hosting options.
For end-devices other update mechanisms are useful as you say, and systems speaking them could hook onto Websub hubs to get notifications they then translate. E.g. your typical wordpress blog has no chance of offering a XMPP channel, but it can ping a WebSub hub since it's only HTTP.
If you want to "upgrade" the old server-client model (request-response) to a realtime instant bidirectional information flow (dialogue communication), you surely want to look at protocols such as XMPP.
Again, you exclude large parts of the web if you require something like XMPP. It totally makes sense to look into such things where it's viable, and you can bridge them with WebSub, but that was not in scope for pubsubhubbub (and WebSub is intentionally only pubsubhubbub with minor clarifications and cleanup).
Pubsubhubbub was a (relative) success because everyone doing RSS feeds could easily add it with their existing tech stack.
I will maybe sound harsh, but to me it looks like the Web (JavaScript?) kiddos wanted their own half-baked solution. As they only know about the Web, then they reinvent the wheel on top of the Web.
Anyway, if the proposal is useful to some people, then it won't do harm to have it in the public domain.
Pubsubhubbub is actively used on the open web for over a decade, from large centralized services to wordpress instances on random PHP shared hosts, has been used for at least parts of Wikipedia, ... It actually works. I'd say that's a success compared to theoretical XMPP based solutions.
This reminds me a lot of the SUBSCRIBE method [0] in Microsoft’s WebDAV extensions (I think this is used for EAS?). You pass a Call-Back header with a URL that gets called with NOTIFY [1]
2. We have HEAD, can we do service discovery using HEAD?
3. Why not let a topic be a HTTP URL? “PUB /user/john/position HTTP/1.1\r\ndata...”.
4. Subscription expiration as a way to force subscribers to renew and upon renew get redirected to other servers is pretty cool. NATS has a special message (the INFO message) to do the same, but you might be in the middle of an important request-reply session you don’t want to abort.
5. The authors could have made this protocol very “non-http-ish” by implementing what amounts to Redis but in HTTP. I’m glad they didn’t. This still feels like HTTP, which is great.