MongoDB 2.0 Should Have Been 1.0

rkalla · on Nov 7, 2011

I don't intend this comment to be an insightful deconstruction of the NoSQL space and/or Mongo... but does anyone else notice that the level of energy around a project (positive or negative) usually indicates its progress along the hype-dissolution cycle[1]?

I try not to take the actual comments or articles as law, but rather use them as a temperature reading to figure out where in the slope we are currently for a given tech or trend.

Given that Mongo-talk has absolutely dominated HN over the weekend as well as reddit/r/programming, I am interpreting this as being in the bottom of the dissolution curve right now.

It is this point where the community push-back and temporary "hating on" forces the team to go into overdrive, addressing whatever pain points the community has griped on the loudest, in this case:

  - Write locks
  - Durability / Replication consistency (already addressed)

This is like tempering steel by pounding on it... but instead it is the community pounding on the people over at 10gen. That sucks, but this right of passage for them will see sweeter days on the other side.

I imagine Mongo 3.0 will represent the final climb out of the dissolution curve where all open complaints have been addressed and we actually get back to solving problems with the technology.

I don't know that Cassandra or Redis or CouchDB have completed their hype-cycles yet, because they haven't had the hyper-aggressive response from the community during the dissolution step. They are all popular and well-liked, but it seems like their popularity is still climbing.

It is all interesting none the less. Mongo is a huge success, regardless of how many of these articles are written.

I've not seen a team as dedicated and involved like 10gen is for a long time; Eliot still answered 100s of messages on the group every week (Along with every new member of the team) -- which I find the biggest indicator of Mongo's future success. If the CTO is carving out that much time during the day to stay involved, while still bug fixing, replying to posts like these and testing bug reports... that's a lot of love right there.

[1] http://en.wikipedia.org/wiki/Hype_cycle

socratic · on Nov 7, 2011

This seems like an odd analysis if you mean that MongoDB is hitting the trough of disillusionment. MongoDB, Cassandra, HBase, and Redis all came out at roughly the same time (2008, 2009) according to Wikipedia and their project pages. Is there a reason they would be on totally different hype cycles?

As far as I can tell, no one has hated on Redis or HBase (except for the brief period when antirez tried to add VM to Redis) because they both (a) work and (b) solve real use cases. Has there been any suggestion that Redis or HBase lose data?

However, maybe you are right in a more general sense. Do you think that the idea of NOSQL itself is reaching the trough of disillusionment? Are we seeing a shake out of which of these data stores are actually designed by people who know what they are doing, both in (database) theory and in (systems coding) practice?

rkalla · on Nov 7, 2011

Oh I do mean that -- I don't think their time in existence effects the rate at which you move along the hype cycle, I think popularity and deployment does.

I would say Mongo is the most popular NoSQL data store at the moment; whether it is mindshare or deployments and that is what caused the move along the cycle so much faster.

I don't mean to detract from any of the other NoSQL projects; they don't have the marketing or manpower budget that 10gen has so I wouldn't expect them to be at the same place in the cycle. MongoDB came on the scene with the only NoSQL solution that promised SQL-esque queries, insane magnitudes jump in performance AND a big commercial company behind to. To anyone trying to understand "NoSQL", it was the clearest and safest place to look.

Since then we've seen the cracks in that original argument (fast and unstable means terror in production), and 10gen has changed focus as needed and addressed those. During that time it wasn't just coding like a lot of these other projects, they were putting on conference after conference, garnering mindshare and getting developers on board.

An open source Apache project just won't move along a hype path as quickly as a force like that.

(I am making no statement towards quality, performance or worthiness... just positions on the hype-cycle).

  > Do you think that the idea of NOSQL itself is reaching 
  > the trough of disillusionment? Are we seeing a shake out 
  > of which of these data stores are actually designed by 
  > people who know what they are doing, both in (database) 
  > theory and in (systems coding) practice?

I couldn't have phrased it better; yes I think this is exactly what is happening.

The early days it was so exciting to see different ways to store/retrieve data. We had been with SQL for decade(s) and it was very exciting to see something new/fresh and fast popup.

Then everyone started storing data every which way they could think of.

Then a few of us starting solving problems with those new ideas... so far so good.

Then some of those projects and new projects built on those new techniques blew up in popularity, and suddenly the "real world" came knocking and we started to actually test the metal of these things in production... with disk failures, network failures, power failures and administration failures.

Like shaking out a rug, the weakest approaches got shaken out and the strongest teams/products weathered the storm to grow stronger and more stable.

2011 was the year NoSQL "Grew up", I imagine 2012 and 2013 will be the year that NoSQL comes all the way out of the dissolution curve completely and, in a metaphysical sense, "goes into production".

I mean that in the most hand-wavy way, not literally... literally LOTS of people have it in production.

I mean it in the sense that you stop seeing articles like these that sparked all the Mongo hype recently or articles about horrible shortcomings or failures about XYZ datastore.

Early on the teams making the NoSQL solutions AND the users didn't really understand where this boat was going or how the puzzle pieces fit together... they just kept working and refining.

This entire year we've seen more and more specialization in the NoSQL community:

  - Antirez gave up on data-larger-than-ram approaches and 
    wants to focus Redis on what it is amazing at: being 
    fast, in memory.
  - CouchDB, building on its uniquely awesome m-m 
    replication, moves into the mobile space with data sync 
    solutions that are awesome.
  - MongoDB keeps replacing MySQL in production at many 
    large-scale startups in the valley; showing more and 
    more the exact migration path to take.
  - Cassandra becomes markedly easier to use with CQL and 
    combined with its CouchDB-esque replication behavior, 
    suddenly makes all sorts of sense in densely populated 
    deployments.

Back in 2010 I couldn't have told you which NoSQL solution was best for which job... closing in on the end of 2011 it is glaringly obvious to me when you would use Redis and when you would use CouchDB (for example).

This seems silly in hindsight, but I don't think we or the teams really honestly knew where this trip was taking the technology a year or more ago.

2012 will be a year of polish, stability and deployments.

2013 will be production deployments and replacing MySQL in more and more places.

2015, it all starts all over again as SSD-optimized data structures and data stores revamp our understanding of databases :) -- I am half-kidding.

That's my 2 cents anyway.

rbranson · on Nov 7, 2011

Cassandra has CouchDB-esque replication?

rkalla · on Nov 7, 2011

In the most general sense (master-master) yes, but in a more detailed sense... not really.

Cassandra and Riak have a similar replication model -- the are deployed into a "ring" and the data in the ring distributed across some (or all) of the nodes depending on your ReplicationFactor (how many nodes to copy each piece of data to).

If you query for a piece of data that a node doesn't have, it hashes the query and routes you to the node that does have it.

CouchDB is a bit different, in that by default it treats every node as a master and replicates it in its entirety to any other nodes registered as a replication target.

You can shard with something like BigCouch, but that is 3rd party.

This is different than Mongo which is master-slave-slave-* or Redis which I believe is master-slave as well (I never got a clear answer on how "slave" nodes in Redis resolve or push changes back upstream to the master).

codyrobbins · on Nov 7, 2011

MongoDB is on its way to becoming the default datastore for web apps.

In my experience there is no way that this is possibly true.

misterbwong · on Nov 7, 2011

I sense another "SQL is not dead. It's still used by 90% of the web." article being written somewhere on the internets.

This is not directed at my parent comment but, seriously, this is getting tiring. Comparing NoSQL to SQL is like comparing a rubber mallet to a hammer. Sure, both might be good for some of the same things, but each has its specific use case.

jeffdavis · on Nov 7, 2011

I'm not sure what your point is. Are you saying that MongoDB is on its way to becoming the default datastore for web apps? Or are you saying that it's not, and the comparison never should have been made in the first place?

misterbwong · on Nov 7, 2011

I'm saying that there isn't a default datastore for web apps in the same way there isn't a default language for programming. Different data stores do different things and web apps are so varied that certain apps will benefit from NoSQL, some from SQL, and yet others from straight text file storage. I find it tiring that everyone thinks their choice of datastore is the bestforeverythingontheweb datastore.

jeffdavis · on Nov 8, 2011

I mostly agree, but that mentality certainly came from somewhere.

I think it's pretty well established that MySQL was the default data storage system (I say to include a broad range of systems) for web applications in the open-source world for a good chunk of the last decade.

And there's at least some reason for a default to exist. There are many applications where the author(s) don't have particular data storage/management expertise, and they'll be looking to use the "best practice" or "default" system that everyone else is using. So it sounds entirely reasonably to me that there will, again, exist a default way to store and manage data.

And it also seems natural that various systems will vie for that title, because there are a huge number of potential users there. Others will avoid that title because they want only experienced users to be involved (which I think is misguided, but it seems there are always a few).

So, I agree with you in the strict sense that there's no bestforeverythingontheweb datastore, and it's way too early to assign that title to anyone right now, but striving for broad appeal is certainly a reasonable thing to do.

dhimes · on Nov 7, 2011

Agreed. I'm using MySQL to store relational user data and CouchDB to store docs- in the same application.

einhverfr · on Nov 7, 2011

This us just about exactly right. You have to look at your data, what it means, and how you want it to be able to be used, before you look at how to store it. In some cases NoSQL makes sense. In others a real solid RDBMS makes sense.

matthewcford · on Nov 7, 2011

I've been using MongoDB for well over a year now in around 6 apps (moved on from CouchDB) and I agree prior to 1.8 it should have been made more obvious that there were still some stability issues.

I have seen first hand some of the issues raised, we've had data disappear, recurring random crashes, ect. But I think the difference is 'everyone' knew that there were issues with MongoDB, you just needed to check in jira. Jumping to 1.0 too early is clearly part of the reason for this backlash as not everyone thinks to check the issues because they've come to believe 1.0 means its ready for mass adoption.

That being said, I love MongoDB and I would still use it in other apps, just got to decide if it's the right tool for the job.

dhimes · on Nov 7, 2011

Why did you move from couch? I'm considering couch for a project, and am not especially knowledgeable in the space. Couch has worked fine for a low-load, minimal-functioning prototype store (no replication needs, etc.). Its scary feature to me is dealing with compacting-- how and when to schedule it so a large db won't get bogged-down.

daleharvey · on Nov 7, 2011

Couch now has an inbuilt compaction deamon, so you can configure it to run automatically

https://github.com/apache/couchdb/blob/trunk/etc/couchdb/def...

dhimes · on Nov 7, 2011

What I haven't tested, though, is how long compaction takes- that is, how it scales with db size and whether more frequent compaction means closer to constant scaling.

Once the prototype was up I started working on other parts of the system (and the business for that matter) and only half-paid-attention to the mailing list.

The mailing list for couch is quite good, btw.

rdtsc · on Nov 7, 2011

It is better to do it during downtime. You can basically provide scheduling rules such as 'compact when fragmentation % > X AND time-of-day window is Y'.

matthewcford · on Nov 7, 2011

At the time, I really liked the idea of couch doc versions, but considering that go away after compaction I was sad about that it didn't give me versioning for free.

I think it just came down to preference, MongoDB has a nice way of doing quieres, without having to use views, also its a bit faster, although I've seen people speed up couch by using protocol buffers etc. Now that I havent been following couch for a while the fragmentation in between couch versions, doesn't help when evaluating if I should try it again.

dhimes · on Nov 7, 2011

I find that strange, also. It just seems like, either compact continuously (and don't make me worry about enormous dbs because of hidden files, and the trouble that can happen when it gets too large (> 1/2 disk space, so I understand)), or give easy access to the versions. I'd prefer the first, but from a user perspective (although maybe not a db-designer perspective), either-or makes sense.

That's the fallout of eventual concurrency, though.

feralchimp · on Nov 7, 2011

Raise your hand if you're relying on package version numbers to tell you which packages to implement in your high-volume production environment against live customer data.

ltbarcly3 · on Nov 7, 2011

Your comment would be funny, if it didn't have so much truth to it.

From now on, for all my open source code, I'm going to version with this translation table:

Alpha: Beta

Beta: increment version by random number between .1 and .2, eg: 2.0 becomes 2.1.7

Release Candidate: add the word "Enterprise Edition" to the version

Release: Add the same letter Oracle added for that version: 8 -> 8i, 9 -> 9i, 10 -> 10g. The exception is anything that happens to be a BMW model, in which case I'll just make it the same as the BMW. 3.2.8 -> 3.2.8xi, 4 -> z4, &c.

cpeterso · on Nov 7, 2011

No "Technical Preview" releases?

latch · on Nov 7, 2011

I'm not a big fan of having version numbers have some type of special meaning. To me, your post implies that there's a line in the sand at some magic version number with respect to due-diligence (on everyone's part). Gmail was in "beta" for years...it didn't mean anything. I understand that wiki and history disagrees with me, but it's still how I feel.

Oh, and there's a chance that yesterday's drama was a hoax: http://news.ycombinator.com/item?id=3205573

Edit:

Associating special meaning with these things has always been "gamed". Access went from 2 to 7. Heck, office went from 97 to 2000!

leif · on Nov 7, 2011

The article appears to consider the version number issue symptomatic of a deeper mismanagement of expectations by MongoDB marketing.

drm237 · on Nov 7, 2011

There's a difference between product names for marketing and version numbers used in development.

andrewvc · on Nov 7, 2011

Any reasonable dev knows that version numbers are useless until you know the process and history behind them. You should have a healthy amount of fear before commiting to a brand new to the market db, regardless of 1.0 status.

There are places to fuck around with your stack choices, but databases aren't one of them unless you absolutely need this new tech. Lets be honest and acknowledge that most sites using Mongo today could be using an sql based solution with no issue.

Some people are leveraging it for a reason, others.... just for kicks. If your doing it just for kicks, you'd better be comfortable with uncertainty.

jimbobimbo · on Nov 7, 2011

"There is no doubt that MongoDB has benefitted from an aggressive marketing push. There are more MongoDB conferences held (organized by 10gen) and MongoDB books written (mainly by 10gen employees) than for the other NoSQL datastores combined."

Latest developments around MongoDB remind me the history of Oracle DB described in "The Difference Between God and Larry Ellison..." book: they basically were selling a DB that was far away from prime time. I'm not surprised at all that people may run into the issues with bleeding edge software - I figure that's the price that early adopters must pay anyway.

Zuzz · on Nov 7, 2011

"Being document-based datastores, Riak and CouchDB are the most direct competitors to MongoDB"

But Riak is a Key-Value store, not a document one. If that's the premise I wonder how illuminating the rest can be (I kept reading: it's not)

tsuraan · on Nov 7, 2011

Riak has secondary indices, and map/reduce for ad-hoc queries. You can store raw binary data in it, and it's happy with that, but if you store JSON then it can query it. From what I can tell, the biggest difference between Riak and Bigcouch from a data model POV is that Bigcouch has materialized views, while Riak's are ad-hoc. I'm not an expert in either though...

Zuzz · on Nov 7, 2011

there you go, from the horse's mouth and fresh off the press:

"For better or worse, many people consider MongoDB and Riak to be competitors. In reality, there are very few similarities between the products."

http://seancribbs.com/tech/2011/11/07/mongodb-and-riak-in-co...

Zuzz · on Nov 7, 2011

can they be used to do similar things? yes. are there overlapping use cases? indeed. Are they going to overlap more and more as time goes by and they try to expand out of their niches? absolutely.

But they fall squarely in different categories: Riak KV, MongoDB document (and HBase Column, Neo4J Graph for example)

kg · on Nov 7, 2011

"No open source project has received more criticism in recent years than MongoDB."

[citation needed]

Suggesting that the 1.0 version number should have been reserved until more recently makes me think that what the author of this post is really saying is something like this:

"MongoDB wasn't really production-ready until recently. People who wanted to test something bleeding-edge out in the real world should have still been free to do so, but branding the product as a beta and giving it a sub-1.0 version number would have helped set expectations correctly."

willvarfar · on Nov 7, 2011

What we need is a poll topic along the lines of "I trust MongoDB with my web-startup data" or such. Measure some criticism.

latch · on Nov 7, 2011

[citation needed] indeed. My vote would be Java

bobz · on Nov 7, 2011

"Java" is not an open source project.

As a Java programmer, I've not really heard a particular level of criticism of the OpenJDK project. Java the language sure does have its fault.

Although I suspect you were just Java bashing.

[ed] s/it's/its/g

trustfundbaby · on Nov 7, 2011

I remember being around when everyone was laying into PHP in much the same way as people are tearing into mongodb now and it makes me smile, because it means they're doing something right and they'll be around for quite a while if they're responsive to the feedback.

jeffdavis · on Nov 7, 2011

"At version 2.0, it is finally a stable product free of unexpected surprises."

Wow, that's a bold statement. I don't think I'd go out on a limb like that considering that 2.0 has only been out for a couple months and doesn't have a lot of production users.

dschoon · on Nov 7, 2011

Here's that search-interest graph he inexplicably screencaps in the article, rather than also linking to it:

http://www.google.com/insights/search/#cat=0-5&q=cassand...

I added Cassandra, as it's substantially more popular than Riak (whereas HBase is not, and you only get 5 slots).

on Nov 7, 2011

[deleted]

campnic · on Nov 7, 2011

If you spent months doing research, is there a way you could share your findings besides broad stroke sweeping statements? All these discussions get cluttered with people offering few details about their investigative process but offering their conclusions. It doesn't serve to move the discussion forward because there are anecdotal experiences on both sides.

feralchimp · on Nov 7, 2011

Exactly. There is no "he said, she said" to be had about software on deterministic turing machines. If you're attacking the product's QA as grossly inadequate, do some QA: post steps to repro a bug and get others to verify.

The rest is noise.

PhuFighter · on Nov 7, 2011

This is funny. I thought that the whole paradigm of startups is to first release the minimally useful featureset and work on the other items later? I mean - isn't the goal is to first get revenue and then fix up what needs working later?

willvarfar · on Nov 7, 2011

Is 2.0 rock solid?

Some say not yet: http://news.ycombinator.com/item?id=3202028

electic · on Nov 7, 2011

Shouldn't you have all your disclaimers on the top of the article?

calibraxis · on Nov 7, 2011

People should note that yesterday's anonymous pastebin "Don't use MongoDB" article was apparently a hoax, if you look through the comments. (At least so claims the troll who posted it.) (http://news.ycombinator.com/item?id=3202081)

I used MongoDB last year and it worked fine for me. (I maintained it for about 9 months.) But of course I can't generalize that to other people's experiences, so YMMV. ;) I just used it squarely in the use-case which everyone mentions — many "well-behaved" writes which occur when no one's reading. Multiple replicas. Leveraged its indices. It wasn't an authoritative source of data, so in principle I could repopulate it. Dealt with failure modes, so missing data wasn't catastrophic.

I think articles like this are useful; when evaluating software, one thing I do is assume it's buggy and its proponents are deceivers. (Whether or not they intend to deceive. I can imagine being corrupted by wanting something to be true.) So among other things, I hunt down criticisms. Had this article been around last year, I'm sure I would've found it a useful hub in doing this due diligence.

(The actual thesis, of what the version number should be, is not so important to me... Version numbers are in some sense arbitrary.)

rdtsc · on Nov 7, 2011

A hoax or a double-hoax?

Original post was this by nomoremongo:

http://news.ycombinator.com/item?id=3201772

Post actually discussed was:

http://news.ycombinator.com/item?id=3202081

Very clever.

The way I understand, apparently nomoremongo wrote it but it was reposted quickly by nmongo (http://news.ycombinator.com/threads?id=nmongo) in hopes that they could then become the top post, so later on they can yell in all caps how it was a hoax and thus discredit the original post.

You know nmongo, if you are trying to help MongoDB you just did the opposite. I've said this many times here before, but the best marketing is your competitors stupid marketing. You are providing stupid marketing for MongoDB and you are hurting it.

> People should note that yesterday's anonymous pastebin "Don't use MongoDB" article was apparently a hoax,

Now another question isb do you know that it wasn't a hoax but continue in the same vein in hopes to still save the day, or you actually believe it was a hoax?

latch · on Nov 7, 2011

There was always something suspicious about an anonymous post lacking any verifiable facts. Then, 10gen's CTO states that none of it resonates with any support issues they've had. Then, add this.

I know exactly what it will take for me to believe the pastbin story.

I'm curious, what will it take for you to not believe it?

rdtsc · on Nov 7, 2011

I agree this is a shady post. Even "how" it was posted is shady. So I am not standing 100% behind it. It is just more of a gut instinct.

At the same time, it got on the front page because the story resonated and made sense to others.

There were quite a few people who commented how "oh yeah I've had problems with lost data". And I think that is what pushed the post's popularity more than the original pastebin. So the discussion got a life of its own after a while. Followed by response posts and response posts to those and so on.

calibraxis · on Nov 7, 2011

Well, it's outlier info from an unreliable source. And possibly outdated too. So if I were evaluating MongoDB, I'd probably just discard this info. (Or skim it for whatever tiny scraps of technical info came out of the discussions.)

willvarfar · on Nov 7, 2011

Don't skim the last paragraph, which was very much the point of the whole article, and the take wasn't so technical.