« PXElator introduction | Main | Sharing terminal session to every PC in classroom »

Database design with CouchDB

As you might know by now, I have been really struck by simplicity of CouchDB at last year's OSCON. From then, we got couchapp which is great idea of hosting CouchDB views on file-system for easy maintenance.

So good in fact, that I re-wrote couchapp in perl called design-couch.pl. I needed to deviate a bit from original design (one _design document per application) because PXElator data which I store in CouchDB is... data...

I have been introduced to Relational databases back at university and since then I have been using PostgreSQL for all my database needs. But, this time, I have dumps from commands, SOAP interfaces, syslog messages and arbitrary audit events generated all over the code. I didn't want to think about structure up-front, and View Cookbook for SQL Jockeys convinced me I don't have to, but I decided to make few simple rules to get me started:

  • Create URLs using humanly readable timestamps (yyyy-mm-dd.HH:MM:SS.package.ident) which allows easy parsing in JavaScript (if needed), and ensures that all entries are sorted by time-stamp
  • Augment each document with single new key package (perl keyword on top of each module). It will have sub-keys time decimal time-stamp, name of package, caller sub which called CouchDB::audit and line from which it's called
  • Single _design document for output from one package (which is directory on file-system) just because it easy browsable in Futon.
CouchDB_select_view.png

So, are those rules enough to forget about relational algebra and relax on the couch? Let's take a look at _design/amt/ip,SystemPowerState-count. I applied here almost SQL-ish naming convention - column names, separated by commas then dash - and output column(s).

Map

function(doc) {
  if ( doc.package.name == 'amt'
  && doc.package.caller == 'power_state')
  emit([doc.ip,doc.SystemPowerState],1);
}

Reduce

function (k,v) {
  return sum(v)
}

When run, this map/reduce queries produce result like this:

KeyValue
["172.16.10.200", null]4
["172.16.10.16", "5"]21
["172.16.10.16", "0"]9
["172.16.10.16", null]8
["10.60.0.196", "0"]6
["10.60.0.195", "256"]8
["10.60.0.194", "256"]11
["10.60.0.193", "256"]3

So far, so good. But what if I wanted to average all ping round trip times for each ip?

If you where using SQL, answer would be:

select ip,avg(rtt) from ping group by ip
However, evil rereduce roars it's head here:

Map

function(doc) {
  if ( doc.package.name == 'ping' )
   emit(doc.ip, doc.rtt)
}

Reduce

function (k,values,rereduce) {
  if (rereduce) {
    var total_sum = 0;
    var total_length = 0;
    for (var i = 0; i < values.length; i++) {
      total_sum += values[i][0];
      total_length += values[i][1];
    }
    return [total_sum, total_length];
  } else {
    return [sum(values), values.length];
  }
}

Since we are called incrementally, we can't average averages. We need to collect total sum and number of elements and perform final computation on client:

KeyValue
"193.198.212.4"[779.0038585662847, 9]
"193.198.212.228"[902.6305675506585, 10]
"192.168.1.61"[906.698703765869, 11]
"192.168.1.34"[995.9852695465088, 11]
"192.168.1.3"[316.55669212341303, 6]
"192.168.1.20"[506.162643432617, 8]
"192.168.1.2"[473.91605377197277, 11]
"192.168.1.13"[649.2500305175784, 11]
"172.16.10.10"[49.9579906463623, 1]
"172.16.10.1"[250.78511238098127, 15]
"127.0.0.1"[62.57653236389161, 16]
"10.60.0.94"[81.6218852996826, 2]
"10.60.0.93"[186.49005889892578, 6]
"10.60.0.92"[386.7535591125485, 5]
"10.60.0.91"[1070.863485336304, 9]
"10.60.0.90"[428.4689426422117, 10]

If you manage to wrap your head around this, you are ready to dive into CouchDB.

TrackBack

TrackBack URL for this entry:
http://blog.rot13.org/mt/mt-tb.cgi/649

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on August 31, 2009 3:23 PM.

The previous post in this blog was PXElator introduction.

The next post in this blog is Sharing terminal session to every PC in classroom.

Many more can be found on the main index page or by looking through the archives.

Creative Commons License
This weblog is licensed under a Creative Commons License.