Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Pump, a dead simple Pythonic abstraction of HTTP. (adeel.github.com)
118 points by pyninja on July 27, 2011 | hide | past | favorite | 61 comments


"What WSGI should have been."

Well, not at all. Pump is what werkzeug, webob, and other more friendly wrappers on WSGI already are. Basically pointless duplication of work, without understanding why WSGI can't be this simple.

One basic reason that WSGI can't be as simple as just returning a dictionary is that you don't necessarily want the entire body of your response pre-computed before starting to return data to the client. What about long running connections? What if you want to return the head of the response immediately, so the client can start pulling css and js while you compute the body of the response? What if you want to do chunked encoding to support long polling connections, or responses where you don't know the response size beforehand?

Pump is basically doing what lots of other things already do, except without quite understanding HTTP quite as well.


Thanks for the feedback. I'm not willing to accept that WSGI couldn't be this simple. Pump's specification is modeled on Ring for Clojure (https://github.com/mmcgrana/ring), and I'm sure there's a way Ring gets around the issues you mention. I'd never had to do any of those things before but I'll look into it now.


Werkzeug and WebOb already do what you are trying to do, and they do it in a completely fantastic way. I'm not saying that to discourage you from learning and contributing, but what do those two libraries do wrong that you are hoping to improve on?

In the meantime, don't bash WSGI until you understand why it is the way it is.


Werkzeug and WebOb are not what I'm trying to do.

I'm reposting a reply I made earlier to irahul:

Pump aims to replace WSGI entirely. That is, I believe it does a better job of what WSGI was intended to do.

I understand your point that web developers don't necessarily work with WSGI on a day-to-day basis. But if you look at the Ruby web community, Rack middlewares are much more prevalent than WSGI middlewares. Application developers (as opposed to framework developers) often add functionality as a Rack middleware, so that it can be reused in different applications, even using different frameworks. Why isn't that happening with Python as much? In the Python world, instead of writing even simple middlewares to for basic functionality like https://github.com/adeel/pump/blob/master/pump/middleware/pa... or https://github.com/adeel/pump/blob/master/pump/middleware/co..., every framework ends up reimplementing it. I believe this is because the WSGI API is ugly and not as easy to understand as it could be (just look at the average WSGI middleware).


What middleware do you think are missing? In general, stuffing application logic into middleware leads to problems, so I don't think your premise is sound.

http://dirtsimple.org/2007/02/wsgi-middleware-considered-har...

I think your approach is harmful and will put developers who use this library in a very bad situation. You are putting a simplifying abstraction on top of HTTP. As with all simplifications, things which do fit perfectly in your tool become very easy, and things that don't fit perfectly become completely impossible.

Finally, I stand by my argument. Werkzeug and WebOb let you put the exact kind of simple interface on top of WSGI that you are attempting to, but with the benefits of not restricting you to a subset of HTTP, and interoperability with awesome tools like mod_wsgi.


If you are really seeking to replace WSGI, then I'd encourage you do a bit more research, write a specification, get at least one more implementation, and then submit it as a PEP.

Documenting at this level is challenging & rewarding. The hard part about specification work, beyond explaining the design and providing adequate justification, is getting consensus. By doing so, you'll learn quite a bit, open yourself up for critique, and in the end perhaps provide a credible alternative.


I'd like to do that, but I'm afraid of how much time and effort it would take. I really just wrote Pump/Picasso to use at our new startup, Beagle (http://beagleapp.com). I was expecting roughly this kind of response from the Python community, but who knows, maybe someone will use it.


At the risk of sounding harsh: it's an incredible failure of time and project management for you to have written your own Python web framework as part of launching a startup.


This is particularly ironic given the premise of your startup is there are some things you don't have time to do that others can do for you.


I'd advise you to spend more of your time on building what's the core of your new business, because there are tons of higher priority things you'll have to accomplish. Web frameworks, even "lightweight" ones, in Python, is/was not an unsolved problem. Lots to choose from. Pick and go.

</speaking-from-experience>


I think the pertinent question here is: what do you think 'WSGI was intended to do.'? From the responses here, it seems your answer to that question is a distinct subset of what WSGI actually intends to do. Which is why Pump offers only a subset of the abilities of, for instance, Werkzeug.


I'm pretty sure the plural of middleware is middleware.


As a special case, how about allowing the response body to be an iterable, and writing whatever blocks of data it produces back to the client?

You should also allow multiple headers with the same field-name, since that's in the spec.

I like the idea of Pump, and am tired of frameworks protecting me from HTTP.

EDIT: Looking at the WebOb code. It does have quite a few conveniences for working with HTTP messages. I'm not sure if copying bodies into temp files in order to make them seekable is a "completely fantastic way" of doing things, but if I were you I'd definitely read through WebOb to see what kind of problems you might be up against.


WSGI specifically specifies the response body as an iterable. Wheel, prepare to be reinvented.


WebOb maintainer here. What's your alternative to copying body to a temp file if you need to make it seekable?


> I like the idea of Pump, and am tired of frameworks protecting me from HTTP.

if you don't like being "protected from http" why not write raw wsgi applications?


How is a gateway that reads, buffers, and parses HTTP headers into an environment object [1] before turning it over to your "raw wsgi application" not protecting it from HTTP?

WSGI protects you from HTTP. CGI protects you from HTTP. mod_python and mod_perl protect you from HTTP. If you're unable to read and parse the complete HTTP request yourself -- perhaps incrementally, there's an idea -- you're protected from HTTP. Something is imposing policy like how many headers to accept, what the longest header should be, how to fold multiple headers with the same field-name, that it's okay to consume memory buffering all the headers, and so on.

In my ideal world, a web app server has access to the full HTTP request stream, calls an incremental HTTP parser [2] [3], and does whatever it wants along the way. If the typical use case is to accumulate a full request object and call a handler, fine, that can be made convenient. But the web app gets to decide.

Perhaps my issue is not with frameworks (in the sense of Django, Ruby, etc), but with web servers. Except, I view the infrastructure for hosting a web app inside a web server as yet another framework. The common use case is optimized for at the expense of the less common use cases, which become more painful than they should be. Or sometimes outright impossible.

TL;DR -- Libraries over frameworks. In Soviet Framework Russia, you don't call code...code call YOU.

[1] http://www.python.org/dev/peps/pep-0333/#environ-variables

[2] https://github.com/ry/http-parser

[3] https://github.com/mongrel/mongrel/tree/master/ext/http11


"Libraries over frameworks. In Soviet Framework Russia, you don't call code...code call YOU." -- exactly! Very few people get this, which is a shame.


Doing something simple is fine, but what I think other people are saying is that pump is reductively simple.


The Rack style API is nice for the simplicity. I <3 it.

The Rack style api is NOT a great representation of the HTTP protocol. Specifically anything with a streaming or chunked response.

start_response/yield may not be the absolute best API for this, I haven't thought to much about that. But if you go look into what was done in Rails 3.1 for chunked responses you may realize it's not actually too great.


The point of WSGI is not to be a good abstraction of HTTP. As an app author, you aren't supposed to care. The advantage of WSGI is that it's standard, and everything uses it. That means you can use any web framework with any web server, and it will all Just Work.

When you write your own copy of WSGI to change how some words are spelled, you don't gain much, but you lose the whole WSGI community. This seems rather pointless to me.


FTA:

Take advantage of existing WSGI tools. Pump comes with adapters for serving Pump apps with WSGI servers and converting WSGI middleware to Pump middleware.


> That means you can use any web framework with any web server,

Or if you are really keen on working low level, you use a nice wsgi library viz. werkzeug.


It seems like the author here is comparing Pump to WSGI -- perhaps a better comparison is to Ian Bicking's WebOb (http://webob.org) or Armin Ronacher's Werkzeug (http://werkzeug.pocoo.org/).

WSGI is a low-level protocol that provides a minimal interface to an HTTP Server ala CGI. It is purposefully not an application-level HTTP toolkit. For example, a WSGI component takes an input stream and returns an iterable which could yield output chunks... of perhaps in infinite data stream. These edge cases are sometimes very important and why the interface is designed as it is: inconvenient as it may be for simple apps.


One issue that I have with it is that there is no distinction between script name and path info. Obviously, you don't have to call them that, but not having a distinction between the two makes it impossible to serve two Pump apps on the same domain and dispatch between them.

Also, there's no clearly defined format for middleware added by keys - the included middleware just adds plain old keys as it pleases.

But more generally, I don't really see the purpose of this. WSGI is obviously not ideal (thanks to start_response and CGI environment variables), but it's also quite firmly in place in the Python world. Not to mention that it would take the Pump library quite a while to catch up to Werkzeug or WebOb in terms of having all the necessary HTTP primitives implemented. (Multipart parsing, anyone?) Unless the server makers get on board, Pump is pretty much just an added layer of complexity on top of WSGI. Instead of "server | WSGI | WSGI library | framework", you have "server | WSGI | Pump adapter | Pump library | frameworK".


I don't see any way to use Pump to stream data in (a very long POST) or stream data out (a very long body). While it is certainly "dead simple", claiming that it is "what WSGI should have been" is a serious stretch.


by passing dicts you are removing what makes WSGI work so well: lazy loading, chunked responses, middleware (encoding, caching etc.) by decorating, iterators/generators etc.

your solution is going to be slower, more memory intensive and will not be able to be http 1.1 compatible. there is a reason why WSGI was designed the way it is


Here's another NoWSGI HTTP server library (Brubeck):

http://news.ycombinator.com/item?id=2770866


I hadn't seen this, looks interesting.


Looks useful, though I think any modern abstraction like this should come with at least the scaffolding for future support of websockets. (They are a bit in flux right now, but will ultimately be incredibly useful.)


WebSockets is not HTTP and should not be handled like HTTP. I just reimplemented a WS wrapper by removing its dependency on HTTP stuff, and ended up decreasing its LOCs and complexity.


This seems interesting but I'm not sure why. What are some use cases?


Building web apps! Basically, it's meant to be a replacement for WSGI. In my view, the problems with WSGI are that it has an unpythonic API (see: start_response and environ), and that it doesn't come with standard middleware that all frameworks use, so they all end up reinventing the wheel. Pump does a better job of abstracting out the details of HTTP, and also comes with a lot of useful middleware. This makes it really simple to write a web framework: just implement the routing and glue together a bunch of middlewares. For an example, see Picasso (https://github.com/adeel/picasso), a simple but functional framework I built on top of Pump.


> In my view, the problems with WSGI are that it has an unpythonic API (see: start_response and environ),

Why do you find `start_response` unpythonic?


The problems with start_response have been discussed at length in Web-SIG. As far as I know, they're planning to remove it from the spec in WSGI 2.0 (whenever that's published).


I see http://wsgi.org/wsgi/WSGI_2.0 mentions:

> We could remove start_response and the writer that it implies.

I searched for `wsgi start_response issues` and didn't get anything useful. Care to point out what's the fuss with start_response and why it's unpythonic?


I think this was the original wart that started the whole WSGI 2.0 process. As I recall, it was PJE himself who recommended dropping start_response and replacing it with a return tuple of (status, headers, iterable). PJE comments in one thread that this isn't a reduction in features, but an improvement in usability:

"Note that in the WSGI 2 calling protocol, you would simply modify your return values, rather than needing to create a function and pass it down the call chain."

http://mail.python.org/pipermail/web-sig/2009-November/00424...


Here is an example of a thread where it's discussed: http://mail.python.org/pipermail/web-sig/2009-November/threa...


I believe the intent is to be Rack[1] for Python. The beauty of Rack is that I can build a new web framework, web server, or web app plugin and have it "just work" with the rest of the ecosystem. And because the Rack API is so simple, building to spec is easy.

In the Ruby world if I want to use the awesome library Sass all I have to do is:

    gem install sass
Because it functions as a Rack plugin it automatically works with any Ruby web framework & Ruby web server combo I choose. No special setup required.

-----

[1] http://rack.rubyforge.org/


    # WSGI
    def app(environ, start_response):
        start_response('200 OK', [('Content-Type', 'text/plain')])
        yield 'Hello World\n'

    # Rack
    app = proc do |env|
    [ 200, {'Content-Type' => 'text/plain'}, "a" ]
    end
WSGI is the Rack for Python. In fact, WSGI predates Rack, and Rack is WSGI inspired.


Rack was based on WSGI, however the point is valid since Pump is using an API closer to Rack than WSGI. Indeed, the Pump website states: "No fancy start_response or environ here."


    # Pump
    def app(request):
        return {
          "status": 200,
          "headers": {"content_type": "text/plain"},
          "body": "Hello World"}


You sure have put effort and the project looks good, but I am missing the purpose. If the problem it tries to solve is parsing environ to form response objects, or providing vanilla middlewares, that problem is very well solved by a wsgi library. Have you looked at werkzeug?


Pump aims to replace WSGI entirely. That is, I believe it does a better job of what WSGI was intended to do.

I understand your point that web developers don't necessarily work with WSGI on a day-to-day basis. But if you look at the Ruby web community, Rack middlewares are much more prevalent than WSGI middlewares. Application developers (as opposed to framework developers) often add functionality as a Rack middleware, so that it can be reused in different applications, even using different frameworks. Why isn't that happening with Python as much? In the Python world, instead of writing even simple middlewares to for basic functionality like https://github.com/adeel/pump/blob/master/pump/middleware/pa... or https://github.com/adeel/pump/blob/master/pump/middleware/co..., every framework ends up reimplementing it. I believe this is because the WSGI API is ugly and not as easy to understand as it could be (just look at the average WSGI middleware).


It seems that what you ended up doing is, in fact, reimplementing things that were implemented zillions of times before. Am I wrong or the Pump middleware doesn't work for any WSGI app? If it had followed the WSGI middleware basic concept, you'd be closer to achieve the goal of reusable components across frameworks.

"Pump aims to replace WSGI entirely." <- this is very ambitious. :)


They are both predated by:

    public class App extends HttpServlet {
      public void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
        response.setContentType("text/plain");
        response.getWriter().println("Hello world");
      }
    }


And that is predated by

    # hello.cgi
    print "Hello World\n";
I wasn't trying to claim WSGI is the first web server to application server interface. I was just showing Rack and WSGI are similar, and Rack was WSGI inspired.


Yes, but IMHO servlets got it all wrong, and WSGI took very little from servlets (though for instance Webware, a pre-WSGI framework did use the servlet model). But it owes much to the superior CGI model; replace processes with function calls and structure the response lightly and you have WSGI.


Ianb, may I inquire as to what is the difference you mean between servlets and cgi. A cgi written in perl for example, was essentially a perl servlet, I think. How is the CGI model fundamentally different from a servlet model, where your application code is given a request as a parameter to a function, and must return a response?


> How is the CGI model fundamentally different from a servlet model,

Since both are web server to application server interfaces, they aren't fundamentally different - the difference is cgi was language independent and hence defined for the common minimum. cgi couldn't have been defined in request and response objects - it would have caused trouble for languages which doesn't have objects.

> where your application code is given a request as a parameter to a function, and must return a response?

cgi is not given a request parameter - the request parameters are passed in the environment. And cgi doesn't return a response object - whatever it writes to stdout constitutes the response. cgi had to cater to all sorts of implementation - assuming request/response objects wasn't a possibility.

Servlets and cgi aren't fundamentally different, but I guess we can agree they are sufficiently different.


To add/agree: I think the biggest difference is that CGI dealt in data, and did not have objects/APIs/etc. In many ways it would have been reasonable to skip even that, and pass an HTTP request in on stdin, and get an HTTP response on stdout, with just some minimal sanitizing promises; but I don't think I've ever seen that approach. Coincidence of history I suppose. Anyway, WSGI also carefully avoided any objects, only using standard data structures (dicts/hashes, strings, ints, ordered-associative-arrays, and iterable response). The result is a functional API without an opinions.


Yes, Pump was heavily inspired by Rack and especially Clojure's Ring (https://github.com/mmcgrana/ring).


I don't get the purpose. WSGI was meant to low level to cover the common minimum. If you want request, response semantics, use a higher level library - I use werkzeug which wraps wsgi, or flask which is small web framework built on werkzeug.

I don't think anyone other than wsgi library implementors code to WSGI. WSGI would be a problem if that's how python web programming was to be done - but that's not the case.


I like how the circle if inspiration loops back to Python: Python WSGI inspired Ruby Rack. Ruby Rack inspired Clojure Ring. Clojure Ring inspired Python Pump.


I never understand the mentality of modern wsgi/cgi design. Passing parameters using environ? Why not directly as a python request object?


For consistency. If there was a request object, there would need to be some sort of standard implementation that there was always access to so that the object could be properly instantiated, instance tested, etc. By using a plain old dict, you ensure compatibility at the cost of attribute access.

Also because WSGI is designed to be low-level, and if you want a request object you should really be using a library or framework.


But the old model sucks. If a process can handle more than one request, how does it change environ if there is only one process?


It is ironic that they mention "don't have to reinvent the wheel" in that page.


As far as web frameworks go, wsgi is pretty low level already. If you want to do a framework, move to a narrow vertical, it might get more traction.


http://xkcd.com/927/ Seems relevant here...


Rolling your own small web framework in some arbitrary language seems to be one of the easier ways to add "open source project author" to one's resume. Not that I'm knocking it. But after you've seen the wheel reinvented the Nth time, you get increasingly less impressed on N+1, N+2, etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: