Database design with CouchDB

As you might know by now, I have been really struck by simplicity of CouchDB at last year's OSCON. From then, we got couchapp which is great idea of hosting CouchDB views on file-system for easy maintenance.

So good in fact, that I re-wrote couchapp in perl called I needed to deviate a bit from original design (one _design document per application) because PXElator data which I store in CouchDB is... data...

I have been introduced to Relational databases back at university and since then I have been using PostgreSQL for all my database needs. But, this time, I have dumps from commands, SOAP interfaces, syslog messages and arbitrary audit events generated all over the code. I didn't want to think about structure up-front, and View Cookbook for SQL Jockeys convinced me I don't have to, but I decided to make few simple rules to get me started:

  • Create URLs using humanly readable timestamps (yyyy-mm-dd.HH:MM:SS.package.ident) which allows easy parsing in JavaScript (if needed), and ensures that all entries are sorted by time-stamp
  • Augment each document with single new key package (perl keyword on top of each module). It will have sub-keys time decimal time-stamp, name of package, caller sub which called CouchDB::audit and line from which it's called
  • Single _design document for output from one package (which is directory on file-system) just because it easy browsable in Futon.

So, are those rules enough to forget about relational algebra and relax on the couch? Let's take a look at _design/amt/ip,SystemPowerState-count. I applied here almost SQL-ish naming convention - column names, separated by commas then dash - and output column(s).


function(doc) {
  if ( == 'amt'
  && doc.package.caller == 'power_state')


function (k,v) {
  return sum(v)

When run, this map/reduce queries produce result like this:

["", null]4
["", "5"]21
["", "0"]9
["", null]8
["", "0"]6
["", "256"]8
["", "256"]11
["", "256"]3

So far, so good. But what if I wanted to average all ping round trip times for each ip?

If you where using SQL, answer would be:

select ip,avg(rtt) from ping group by ip
However, evil rereduce roars it's head here:


function(doc) {
  if ( == 'ping' )
   emit(doc.ip, doc.rtt)


function (k,values,rereduce) {
  if (rereduce) {
    var total_sum = 0;
    var total_length = 0;
    for (var i = 0; i < values.length; i++) {
      total_sum += values[i][0];
      total_length += values[i][1];
    return [total_sum, total_length];
  } else {
    return [sum(values), values.length];

Since we are called incrementally, we can't average averages. We need to collect total sum and number of elements and perform final computation on client:

""[779.0038585662847, 9]
""[902.6305675506585, 10]
""[906.698703765869, 11]
""[995.9852695465088, 11]
""[316.55669212341303, 6]
""[506.162643432617, 8]
""[473.91605377197277, 11]
""[649.2500305175784, 11]
""[49.9579906463623, 1]
""[250.78511238098127, 15]
""[62.57653236389161, 16]
""[81.6218852996826, 2]
""[186.49005889892578, 6]
""[386.7535591125485, 5]
""[1070.863485336304, 9]
""[428.4689426422117, 10]

If you manage to wrap your head around this, you are ready to dive into CouchDB.