Triggers in CouchDB: from queue to external command execution

I have first CouchDB application in production from beginning of this week. This also means that I have new challenges, and this time task is simple: I want to send confirmation e-mail when someone enters data into my CouchDB.
Let's see how that can be done.... Easiest solution would be to send e-mail from web server which I have in the middle, but that would introduce additional latency to requests which require sending e-mail, so this isn't best solution. It would be much better to somehow extend CouchDB with ability to run custom code.

There are several ways to extend CouchDB in any language:

  • View servers are useful if you want to write custom code to massage your data
    Since we don't really want to map over all data in our database, this approach isn't applicable here
  • External processes allows you to write peaces of code which are executed on HTTP request, similar to CGI, but using JSON for passing data
    This might work for sending e-mails, but this would require special CouchDB installation which has installed external process, and I would need to make two requests (one to save data, and another to trigger sending of e-mail)
  • HTTP database API has _changes feed which allows external process to connect to CouchDB and monitor all changes
    This seems like a perfect fit. I can have another process (possibly running on another machine) which will recognize documents which need e-mail notification.

I knew about _changes feed in CouchDB for a while, but Asynchronous architectures with the CouchDB _changes feed webcast pushed me into right direction. There are several things in that webcast which will be useful for this example:

  • _changes?feed=continuous allows us to get instant notifications about document changes (with optional server-side filters) - only requirement is to have HTTP client which can understand CouchDB's chunked JSON responses.
  • Documents in CouchDB can be used as state machine. In our example, we need to somehow note that we are working on sending e-mail (so that other workers won't pick it up while we are sending it) and note that sending is finised.

So, I decided to write couchdb-triggers.pl which implements queue using trigger field within document. In current implementation, you can put shell command in trigger.command and it will be executed on back-end and result will be returned in trigger.output. This requires three document updates, which are tracked using trigger.active array which contains timestamps of each step (with 0 indicating finished trigger). So, how does it look like?

We can submit document with trigger.command field to CouchDB:

$ curl -X PUT http://localhost:5984/monitor/free -d '{"trigger":{"command":"free"}}'
{"ok":true,"id":"free","rev":"1-511080b02d975944ea493eb5e9ef8783"}
If we try to get document a moment later, we will see that it containes trigger.output with results from our command:
$ curl http://localhost:5984/monitor/free
{"_id":"free","_rev":"3-a486faee4d416cd6ecc94183fe88a3bb","trigger":{"active":[1290521633.16487,1290521633.5196,0],"output":"             total       used       free     shared    buffers     cached\nMem:       3076500    3036388      40112          0     289572    1550120\n-/+ buffers/cache:    1196696    1879804\nSwap:      1949688     330296    1619392\n","command":"free"}}
However, as you can see, document was updated two more times (from _rev field). We also see that trigger.active has three elements, last one being zero. This means that trigger was started (first time-stamp), executed (second time-stamp) and successfully stored result back to CouchDB (final zero).

Separating trigger functionality from CouchDB itself also has benefits with replication: we can insert documents in any CouchDB, replicate it against one which has Internet connection and couchdb-trigger.pl running, and then just run replication again to get results.

Scheme with zero as last time-stamp also allows me to write server-side filter function which returns document for particular key only if trigger.status has zero as last element (meaning that trigger finished successfully). This can be used to implement synchronous triggers: first PUT document, and that call _changes?feed=continuous using filter on just that document until trigger fires (to refresh user interface or start replication back to remote database).

It seems that this trigger pattern will be useful for various tasks in the future, and my relational brain is slowly beginning to move into NoSQL land of replicated, schema free databases, but this time with external triggers :-)