Recently in CouchDB Category

I returned from Varaždin, where first Razmjena Vještina u Varaždinu was held. I decided to present on topic which can be summarized as the right problems from non-relational stack on the web.

I fully understand that presentation is packed with information, and I did skip some angular-mojolicious examples which are documented in README so take a look there also.

I have first CouchDB application in production from beginning of this week. This also means that I have new challenges, and this time task is simple: I want to send confirmation e-mail when someone enters data into my CouchDB.
Let's see how that can be done.... Easiest solution would be to send e-mail from web server which I have in the middle, but that would introduce additional latency to requests which require sending e-mail, so this isn't best solution. It would be much better to somehow extend CouchDB with ability to run custom code.

There are several ways to extend CouchDB in any language:

  • View servers are useful if you want to write custom code to massage your data
    Since we don't really want to map over all data in our database, this approach isn't applicable here
  • External processes allows you to write peaces of code which are executed on HTTP request, similar to CGI, but using JSON for passing data
    This might work for sending e-mails, but this would require special CouchDB installation which has installed external process, and I would need to make two requests (one to save data, and another to trigger sending of e-mail)
  • HTTP database API has _changes feed which allows external process to connect to CouchDB and monitor all changes
    This seems like a perfect fit. I can have another process (possibly running on another machine) which will recognize documents which need e-mail notification.

I knew about _changes feed in CouchDB for a while, but Asynchronous architectures with the CouchDB _changes feed webcast pushed me into right direction. There are several things in that webcast which will be useful for this example:

  • _changes?feed=continuous allows us to get instant notifications about document changes (with optional server-side filters) - only requirement is to have HTTP client which can understand CouchDB's chunked JSON responses.
  • Documents in CouchDB can be used as state machine. In our example, we need to somehow note that we are working on sending e-mail (so that other workers won't pick it up while we are sending it) and note that sending is finised.

So, I decided to write couchdb-triggers.pl which implements queue using trigger field within document. In current implementation, you can put shell command in trigger.command and it will be executed on back-end and result will be returned in trigger.output. This requires three document updates, which are tracked using trigger.active array which contains timestamps of each step (with 0 indicating finished trigger). So, how does it look like?

We can submit document with trigger.command field to CouchDB:

$ curl -X PUT http://localhost:5984/monitor/free -d '{"trigger":{"command":"free"}}'
{"ok":true,"id":"free","rev":"1-511080b02d975944ea493eb5e9ef8783"}
If we try to get document a moment later, we will see that it containes trigger.output with results from our command:
$ curl http://localhost:5984/monitor/free
{"_id":"free","_rev":"3-a486faee4d416cd6ecc94183fe88a3bb","trigger":{"active":[1290521633.16487,1290521633.5196,0],"output":"             total       used       free     shared    buffers     cached\nMem:       3076500    3036388      40112          0     289572    1550120\n-/+ buffers/cache:    1196696    1879804\nSwap:      1949688     330296    1619392\n","command":"free"}}
However, as you can see, document was updated two more times (from _rev field). We also see that trigger.active has three elements, last one being zero. This means that trigger was started (first time-stamp), executed (second time-stamp) and successfully stored result back to CouchDB (final zero).

Separating trigger functionality from CouchDB itself also has benefits with replication: we can insert documents in any CouchDB, replicate it against one which has Internet connection and couchdb-trigger.pl running, and then just run replication again to get results.

Scheme with zero as last time-stamp also allows me to write server-side filter function which returns document for particular key only if trigger.status has zero as last element (meaning that trigger finished successfully). This can be used to implement synchronous triggers: first PUT document, and that call _changes?feed=continuous using filter on just that document until trigger fires (to refresh user interface or start replication back to remote database).

It seems that this trigger pattern will be useful for various tasks in the future, and my relational brain is slowly beginning to move into NoSQL land of replicated, schema free databases, but this time with external triggers :-)

As you might know by now, I have been really struck by simplicity of CouchDB at last year's OSCON. From then, we got couchapp which is great idea of hosting CouchDB views on file-system for easy maintenance.

So good in fact, that I re-wrote couchapp in perl called design-couch.pl. I needed to deviate a bit from original design (one _design document per application) because PXElator data which I store in CouchDB is... data...

I have been introduced to Relational databases back at university and since then I have been using PostgreSQL for all my database needs. But, this time, I have dumps from commands, SOAP interfaces, syslog messages and arbitrary audit events generated all over the code. I didn't want to think about structure up-front, and View Cookbook for SQL Jockeys convinced me I don't have to, but I decided to make few simple rules to get me started:

  • Create URLs using humanly readable timestamps (yyyy-mm-dd.HH:MM:SS.package.ident) which allows easy parsing in JavaScript (if needed), and ensures that all entries are sorted by time-stamp
  • Augment each document with single new key package (perl keyword on top of each module). It will have sub-keys time decimal time-stamp, name of package, caller sub which called CouchDB::audit and line from which it's called
  • Single _design document for output from one package (which is directory on file-system) just because it easy browsable in Futon.
CouchDB_select_view.png

So, are those rules enough to forget about relational algebra and relax on the couch? Let's take a look at _design/amt/ip,SystemPowerState-count. I applied here almost SQL-ish naming convention - column names, separated by commas then dash - and output column(s).

Map

function(doc) {
  if ( doc.package.name == 'amt'
  && doc.package.caller == 'power_state')
  emit([doc.ip,doc.SystemPowerState],1);
}

Reduce

function (k,v) {
  return sum(v)
}

When run, this map/reduce queries produce result like this:

KeyValue
["172.16.10.200", null]4
["172.16.10.16", "5"]21
["172.16.10.16", "0"]9
["172.16.10.16", null]8
["10.60.0.196", "0"]6
["10.60.0.195", "256"]8
["10.60.0.194", "256"]11
["10.60.0.193", "256"]3

So far, so good. But what if I wanted to average all ping round trip times for each ip?

If you where using SQL, answer would be:

select ip,avg(rtt) from ping group by ip
However, evil rereduce roars it's head here:

Map

function(doc) {
  if ( doc.package.name == 'ping' )
   emit(doc.ip, doc.rtt)
}

Reduce

function (k,values,rereduce) {
  if (rereduce) {
    var total_sum = 0;
    var total_length = 0;
    for (var i = 0; i < values.length; i++) {
      total_sum += values[i][0];
      total_length += values[i][1];
    }
    return [total_sum, total_length];
  } else {
    return [sum(values), values.length];
  }
}

Since we are called incrementally, we can't average averages. We need to collect total sum and number of elements and perform final computation on client:

KeyValue
"193.198.212.4"[779.0038585662847, 9]
"193.198.212.228"[902.6305675506585, 10]
"192.168.1.61"[906.698703765869, 11]
"192.168.1.34"[995.9852695465088, 11]
"192.168.1.3"[316.55669212341303, 6]
"192.168.1.20"[506.162643432617, 8]
"192.168.1.2"[473.91605377197277, 11]
"192.168.1.13"[649.2500305175784, 11]
"172.16.10.10"[49.9579906463623, 1]
"172.16.10.1"[250.78511238098127, 15]
"127.0.0.1"[62.57653236389161, 16]
"10.60.0.94"[81.6218852996826, 2]
"10.60.0.93"[186.49005889892578, 6]
"10.60.0.92"[386.7535591125485, 5]
"10.60.0.91"[1070.863485336304, 9]
"10.60.0.90"[428.4689426422117, 10]

If you manage to wrap your head around this, you are ready to dive into CouchDB.

About this Archive

This page is an archive of recent entries in the CouchDB category.

Angular is the previous category.

Frey is the next category.

Find recent content on the main index or look in the archives to find all content.

Pages

  • pics
OpenID accepted here Learn more about OpenID
Powered by Movable Type 5.04