As you might know by now, I have been really struck by simplicity of CouchDB at last year's OSCON. From then, we got couchapp which is great idea of hosting CouchDB views on file-system for easy maintenance.
So good in fact, that I re-wrote couchapp in perl called design-couch.pl. I needed to deviate a bit from original design (one _design document per application) because PXElator data which I store in CouchDB is... data...
I have been introduced to Relational databases back at university and since then I have been using PostgreSQL for all my database needs. But, this time, I have dumps from commands, SOAP interfaces, syslog messages and arbitrary audit events generated all over the code. I didn't want to think about structure up-front, and View Cookbook for SQL Jockeys convinced me I don't have to, but I decided to make few simple rules to get me started:
- Create URLs using humanly readable timestamps (yyyy-mm-dd.HH:MM:SS.package.ident) which allows easy parsing in JavaScript (if needed), and ensures that all entries are sorted by time-stamp
- Augment each document with single new key package (perl keyword on top of each module). It will have sub-keys time decimal time-stamp, name of package, caller sub which called CouchDB::audit and line from which it's called
- Single _design document for output from one package (which is directory on file-system) just because it easy browsable in Futon.
So, are those rules enough to forget about relational algebra and relax on the couch? Let's take a look at _design/amt/ip,SystemPowerState-count. I applied here almost SQL-ish naming convention - column names, separated by commas then dash - and output column(s).
Map |
---|
function(doc) { if ( doc.package.name == 'amt' && doc.package.caller == 'power_state') emit([doc.ip,doc.SystemPowerState],1); } |
Reduce |
function (k,v) { return sum(v) } |
When run, this map/reduce queries produce result like this:
Key | Value |
---|---|
["172.16.10.200", null] | 4 |
["172.16.10.16", "5"] | 21 |
["172.16.10.16", "0"] | 9 |
["172.16.10.16", null] | 8 |
["10.60.0.196", "0"] | 6 |
["10.60.0.195", "256"] | 8 |
["10.60.0.194", "256"] | 11 |
["10.60.0.193", "256"] | 3 |
So far, so good. But what if I wanted to average all ping round trip times for each ip?
If you where using SQL, answer would be:
select ip,avg(rtt) from ping group by ipHowever, evil rereduce roars it's head here:
Map |
---|
function(doc) { if ( doc.package.name == 'ping' ) emit(doc.ip, doc.rtt) } |
Reduce |
function (k,values,rereduce) { if (rereduce) { var total_sum = 0; var total_length = 0; for (var i = 0; i < values.length; i++) { total_sum += values[i][0]; total_length += values[i][1]; } return [total_sum, total_length]; } else { return [sum(values), values.length]; } } |
Since we are called incrementally, we can't average averages. We need to collect total sum and number of elements and perform final computation on client:
Key | Value |
---|---|
"193.198.212.4" | [779.0038585662847, 9] |
"193.198.212.228" | [902.6305675506585, 10] |
"192.168.1.61" | [906.698703765869, 11] |
"192.168.1.34" | [995.9852695465088, 11] |
"192.168.1.3" | [316.55669212341303, 6] |
"192.168.1.20" | [506.162643432617, 8] |
"192.168.1.2" | [473.91605377197277, 11] |
"192.168.1.13" | [649.2500305175784, 11] |
"172.16.10.10" | [49.9579906463623, 1] |
"172.16.10.1" | [250.78511238098127, 15] |
"127.0.0.1" | [62.57653236389161, 16] |
"10.60.0.94" | [81.6218852996826, 2] |
"10.60.0.93" | [186.49005889892578, 6] |
"10.60.0.92" | [386.7535591125485, 5] |
"10.60.0.91" | [1070.863485336304, 9] |
"10.60.0.90" | [428.4689426422117, 10] |
If you manage to wrap your head around this, you are ready to dive into CouchDB.