« OmniKey CardMan 5321 supported by librfid | Main | It's about system, stupid! »

MongoDB - so you want fast NoSQL database which you can grok

I have been using CouchDB for some time now, mostly as audit storage for PXElator. Audit data stores are most useful for ad-hoc queries (hum, when did I saw that host last time?), and CouchDB map/reduces took half an hour or more. I wrote mall script couchdb2mongodb.pl to migrate my data over to MongoDB (in 26 minutes) and run first query I could write after reading MongoDB documentation about advanced queries. It took only 30 seconds, compared to 30 minutes or more in CouchDB. I was amazed.

This was NoSQL database which I can understand and tune. MongoDB has indexes and profiler so tuning query down to three seconds was a simple matter of adding an index. All my RDBMS knowledge was reusable here, so I decided to take a look why is it so much faster than CouchDB for same data...

To be honest, MongoDB, High-Performance SQL-Free Database by Dwight Merriman, CEO of 10gen won me over to finally try MongoDB. It was technical enough to make me think about MongoDB arhitecture and benefits. It's clearly pragmatic, let's re-think horizontally scalable hash storage with ad-hoc queries model, but with funny twist about close coupling with language types all encoded in BSON format, which is very similar to Google's protocol buffers.

First, let's have a look at raw side of data on disk. At some level, it will translate to number of IO operations involving rotating platters and usage of buffer cache.

root@opr:~# du -hc /var/lib/couchdb/0.9.0/.pxelator* /var/lib/couchdb/0.9.0/pxelator.couch
655M    /var/lib/couchdb/0.9.0/.pxelator_design
23M     /var/lib/couchdb/0.9.0/.pxelator_temp
7.8G    /var/lib/couchdb/0.9.0/pxelator.couch
8.4G    total

root@opr:~# du -hc /var/lib/mongodb/pxelator.*
65M     /var/lib/mongodb/pxelator.0
129M    /var/lib/mongodb/pxelator.1
257M    /var/lib/mongodb/pxelator.2
513M    /var/lib/mongodb/pxelator.3
513M    /var/lib/mongodb/pxelator.4
513M    /var/lib/mongodb/pxelator.5
17M     /var/lib/mongodb/pxelator.ns
2.0G    total
Here is a first hint about performance: MongoDB's 2G of data (which are used as mmap memory directly, leaving flushes and caching to OS layer) are almost a perfect fit into 3G of RAM memory I have in this machine.

MongoDB has montodump utility which dumps bson for backup and it's even smaller:

root@opr:~# du -hcs dump/pxelator/*
1.1G    dump/pxelator/audit.bson
4.0K    dump/pxelator/system.indexes.bson
76K     dump/pxelator/system.profile.bson
1.1G    total

So I switched PXElator to use MongoDB as storage. I never pushed anything in production after just one day of testing it, but first query speedup from 30 min to 30 sec, and ability to cut it down to 3 sec if I added index (which took about 13 sec to create) is just something which provides me with powerful analytical tool I didn't have before.

TrackBack

TrackBack URL for this entry:
http://blog.rot13.org/mt/mt-tb.cgi/676

Listed below are links to weblogs that reference MongoDB - so you want fast NoSQL database which you can grok:

» DORS/CLUC 2010 conference from Dobrica Pavlinušić's Weblog / Blog
Like every year, we had our local Linux conference. It was very intense event (for first year I'm involved in real organization) and I can say it's all just a big blurb. I had two tutorials, one about my Virtual LDAP and another one about creating Goo... [Read More]

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on January 24, 2010 12:56 PM.

The previous post in this blog was OmniKey CardMan 5321 supported by librfid.

The next post in this blog is It's about system, stupid!.

Many more can be found on the main index page or by looking through the archives.

Creative Commons License
This weblog is licensed under a Creative Commons License.
Powered by
Movable Type 5.04