Dobrica Pavlinušić's Weblog / Blog

Internet Archive BookReader plack server

Last year, I had good fortune to get acquiented with great work which Open Library does. It's part of Internet Archive which itself is a library. So, libraries are not (yet) dead it seems. Brewster Kahle's Long Now Talk explains it much better than I can do, so take 90 minutes to listen to it.

Most interesting part of Open Library (IMHO) is Internet Archive BookReader which is JavaScript application which allows users to browse scanned books on-line. For quite some time, I wanted to install something similar to provide web access to our collection of scanned documents. I have found instructions for serving IA like books from own cluster, but I didn't have a cluster, and converting all documents to IA book format seemed like an overhead which I would like to avoid.

Instead, I decided to write image server for JavaScript front-end using plack. I mean, it's basically a directory with images, right? Oh, how wrong can I be? :-)

It turs out that we have pictures in multiple formats (so sorting them required removing common prefix and using number only to get correct order), and most of are scanned images in pdf documents. Here are all types of documents which can be automatically collected into book for on-line browsing:

images of scanned pages
multi-file pdf file with single image per page
single pdf file with one image for each page
single pdf file with more than one (usually 4) horizontal bitmap strips for each page
normal pdf documents which contain text and needs rendering to bitmap

Source code of my plack server for Internet Archive book reader is on github, so if you want to take a look, hop over there...

ZFS on Linux and pool replication

I have been using ZFS on Linux for some time to provide backup appliance using zfs-fuse. Since then, we got native ZFS implementation on Linux, so I decided to move by backup pool from zfs-fuse to in-kernel ZFS.

Additional reason to move pool over to new machine was to change pool's RAID level. In current ZFS implementation(s) you can't change mirror to RAIDZ1 without re-creating pool and then transfering data over using zfs send and zfs receive. However, when you are creating snapshots for years, and expiring them using script you will have hundreds of snapshots which you need to transfer.

This is where zfs-pool-replicate.pl script comes handy. It uses Net::OpenSSH to connect to two machines (source and destination), list all snapshots on source and transfer them to destination. If you have filesystem without snapshots it will create one @send snapshot which will be transferred. It will also optionally use compression for transfer of snapshot over the network. I am using LZO which is fast compression which nicely transfers 150Mb/s or more over normal 1Gbit/s network without much CPU overheard (and we all have multi-core machines anyway, right?). Current implementation allows you to re-run replication script to transfer only new snapshots creating handy disaster recovery solution.

Current implementation is designed to run from third (management) machine, so I can envision central storage administration tool which will also allow you to transfer LVM snapshots into ZFS snapshots. For now, I'm using shell script for that, but rewriting it in perl would improve error recovery and reporting.

Biblio::RFID - write RFID applications in perl

Allmost two years ago, I began my experiments with RFID, writing support for 3M 810 RFID Reader. Then I tried to make web interface for RFID it in Koha.

This woked, but having separate Comet server was too much complexity for me, so I decided to implement JSONP directly in RFID driver code. Hack, I allready had HTTP server, and local JavaScript interface, so why not?

Move to JSONP allowed me to use original Koha web interface, and just overlay RFID information and form submission code as simple jQuery snippet.

But, two years of development and trying out different approaches produced not-quite-production-quality code. So, I began rewrite called Biblio::RFID. It splits RFID reader support from HTTP and JSONP servers and couples this with documentation and tests. I have production use for it this summer, involving programming of RFID cards as they are printed out, so expect it to change during next few weeks. After that I will push it to CPAN, but I would love to get feedback and comments before that.

Update: It's also availale at CPAN.

Mojo Facets actions, changes and editing

I have spent few last weeks with my head down, adding persistence and changes tracking to Mojo Facets, turning it into much more interesting alternative to web-based data stores like DabbleDB. Idea was simple: I had all data in memory, I should be able to edit it, right?

Well, as it always turns out, if was about three weeks of development, but it moved Mojo Facets into much more interesting use case of making small tweaks to your input data.
Problem is how to keep those changes? Mojo Facets is never master provider for data so saving some kind of audit log which can be applied back on master data is of paramount importance. After all, if you edit that data, you might want to apply those changes back when you re-generate source file or pull new version from some other system.

First idea was to add simple audit log which records all requests in Mojo. I decided to call requests with parameters actions and store them on disk under /tmp. All I had to add was re-submit form in browser and a bit of interface around it. Same form with all parameters can turn Mojo Facets into peer-to-peer application: I just added checkbox which can change destination URL in action to another Mojo Facets installation and I got nice replication of actions to another instance.

But, all was not well. Editing data in browser generates update to specific entry in your dataset, so I decided also to record changes which include old and new field value, and all unique keys for this dataset.

This seems like such a small statement, but getting it up to point where you can load some data, edit it in browser and than apply that changes back on original data (after reboot) or on different dataset with same unique field.
Even better, it should be possible to apply changes log to master data. I prefer to think of it as a replication log to another system.

To integrate better with other systems, filters got export (and import) option which dumps them in simple, one line per entry text file which is accessible over http. It's perfect format it you want to quickly xargs that data into another script, for example to generate more source data with something as simple as:

 cat /srv/mojo_facets/public/export/isi-cro2.js/filter.autor.119 | \
 xargs -i ./bin/isi-download-results.pl 'CA={}'

Speaking of more dataset sources, I also added parsing on html tables, which should allow users at some point to just drop multiple tables from results page into single directory and load them as dataset. Your telebanking doesn't have export you need? No problem! Just save all pages to disk and you are ready to do.
Right now, table parsing needs a bit of heuristics to be really useful. It searches for table on page with correct number of columns, and has support for extracting of header or repeating first row (normal <td>) for column names.

All that would be unusable without profiling to turn it really snappy. This is first time I used Google Chrome for serious development, and while I still dislike it's inspector (firebug's dom inspector is much nicer to me), Speed Tracer extension was very useful for front-end part including network throuput and html/javascript overhead. On server side, I used Devel::NYTProf, and I will talk about it at Slobodni Fastival 4 in Čakovec, so see you there...

Mojo Facets - evolution of faceted browsing

My server side faceted browser just got a bit better. In fact, it become 10 times better. But, let's try to explain this story step by step...

This week I will try to introduce faceted joins. Primary motivation is great Plants For A Future database which consists of more than one text file.

Use case is something like following:
I would like to know all plants which can have medical use, are edable and have perennial habitat (so I don't have to re-plant them every year).

And you can watch the video to see how easily this can be done:

But, this still doesn't make MojoFacets 10 times better than before. This is quite small dataset (still about 10 times bigger than Exhibit could handle), but I had new problem: 100Mb source file a bit less than 30000 items. To make it scale more I implemented pre-calculated filters and sorts. They serve same usage as indexes do in relational databases, but they are calculated on demand and stored in memory.

Let's see in action how does it work with ~30000 items:

In this video, we saw:

starting memory usage of ~13Mb
100Mb dataset with 29869 items
filter by autor with 45644 taking ~10s
use regex filter ic,
godina_izdavanja is numeric facet
jezik filter using cro slo ser regexps and toggle it
show popup title on filters
turn off filters to show under 4s load time
at the end, we consumed ~260Mb of memory

Ok, 4s might not seem blazingly fast, but have in mind that all this is implemented in pure perl (so deployment is lightweight) using Mojolicious web framework. But it has it's overhead. Other than 260Mb or RAM for browser, it will also take 600Mb of RAM memory for server side. But, if you can live with 6* file size factor server side this might be very interesting as a faceted browsing tool for the web.

MojoFacets: server side faceted browsing

I am huge fan of Exhibit faceted browsing of data. However, Exhibit is implemented in JavaScript within your browser and that makes it unusable for larger amounts of data (more than 300 or so). In my case, 3800 elements is unusably slow even in latest Chrome or Firefox.

Something had to be done. If JavaScript inside browser isn't up to the task, you might wonder what would happen if you moved processing back to server side, and use browser just for task which they are good at: displaying generated HTML pages.

So, let me introduce MojoFacets - server-side facet browser based on Mojolicious with a bit of jQuery UI.

Although it seems strange to promote server-side solutions in 2010, this approach still makes sense. For just 40Mb of memory usage server side (including Mojo and dataset) you can get fast and usable facet browsing.

It's about system, stupid!

When you are working as system architect or systems librarian, your job is to design systems. My initial idea was to create small Google out of 12 machines which are dedicated to be web kiosks. I decided to strictly follow loosely coupled principle, mostly to provide horizontal scaling for my data processing needs. I wanted to be able to add machine or two if my query is too slow... This easily translates into "now long will I have to wait for my page to generate results"....

I decided to split my system into three logical parts: network booting, data store, and quick reporting. So, let's take a look at each component separately:

PXElator
- supported protocols: bootp, dhcp, tftp, http, amt, wol, syslog
- boot kiosks using Webconverger (Debian Live based kiosk distribution)
- provides web user interface for overview of network segment for audit
- configuration is stored as files on disk, suitable for management with git or other source control management
MongoDB
- NoSQL storage component which support ad-hoc queries, indexes and other goodies
- simple store for perl hashes from PXElator generated every time we see network packet from one of clients using one of supported protocols
Sack
- fastest possible way to execute snippet of perl code over multiple machines
- this involves sharing information to nodes, executing code on all of them and collecting results back, all in sub 3 second mark!
- web user interface for cloud overview and graph generation using gnuplot

When I started implementing this system last summer, I decided to use CouchDB for storage layer. This wasn't really good choice, since I didn't need transactions, MVCC or replication. Hack, I even implemented forking for document stored in CouchDB to provide faster response to clients in PXElator.

Moving to much faster MongoDB I got ad-hoc queries which are usable (as in I can wait for them to finish) and if that's too slow, I can move data to Sack and query it directly from memory. As a happy side effect, making shards from MongoDB is much faster than using CouchDB bulk HTTP API, and it will allow me to feed shards directly from MongoDB to Sack nodes, without first creating shards on disk.

I'm quite happy how it all turned out. I can configure any host using small snippet of perl code in PXElator, issue ad-hoc queries on audit data on it in MongoDB or move data to Sack if I want to do data munging using perl.

As you noticed by now, I'm using live distribution for kiosks, and machines do have hard drivers in them. Idea was to use those disks as storage with something like Sheepdog. seems like perfect fit. With it in place, I will have real distributed, building size computer :-).

LDAP server application integration with virtual LDAP

You have your new shiny application, and LDAP server on the other side. Easy as pie. What can go wrong?

you use e-mail as login, and application assumes that logins don't have domain in them and allows you embedding of whole login into DN
application can import various interesting fields using LDAP, but you have data somewhere else, and it doesn't really belong into your LDAP
you need to provide subset of data in your database as LDAP server to application

I had written about my saga with LDAP about augmenting LDAP search responses and exposing RDBMS data as LDAP server. But today, I added rewrite of bind, so now I can use unmodified Koha, and all needed quirks for my AAI@EduHr schema are outside application.

This made my attempt to virtualize LDAP almost complete so I created a project page on Ohloh. I will write small updates about status there, so If any of this is interesting to you, hop over there.

HTML5TV: video archive with slide sync without plugins

Last weekend we had Nothing will happen event and I did small presentation about HTML5TV which is:

video annotation interface based on mplayer with slide overview
web interface using HTML5 <video> tag and some JavaScript

I have collected some of presentations at http://html5tv.rot13.org/, so hop over there and have a look. If you have more interesting presentations, you can easily fork HTML5TV at github, annotate presentation and submit it for inclusion (or make your own archive if you prefer that). Have fun and let me know what you think about it.

Sack - sharding memory hash in perl

Main design goal is to have interactive environment to query perl hashes which are bigger than memory on single machine.

Implementation uses TCP sockets (over ssh if needed) between perl processes. This allows horizontal scalability both on multi-core machines as well as across the network to additional machines.

Reading data into hash is done using any perl module which returns perl hash and supports offset and limit to select just subset of data (this is required to create disjunctive shards). Parsing of source file is done on master node (called lorry) which then splits it to shards and send data to sack nodes.

Views are small perl snippets which are called for each record on each shard with $rec. Views create data in $out hash which is automatically merged on master node.

You can influence default shard merge by adding + (plus sign) in name of your key to indicate that key => value pairs below should have values summed when combining shards on master node.

If view operation generate huge amount of long field names, you might run out of memory on master node when merging results. Solution is to add # to name of key which will turn key names into integers which use less memory.

So, how does it look? Below is small video showing 121887 records spread over 18 cores on 9 machines running first few short views, and than largest one on this dataset.

If your browser doesn't have support for <video> tag, watch Sack video on YouTube or using ttyrec player written in JavaScript.

Source code for Sack is available in my subversion and this is currently second iteration which brings much simpler network protocol (based only on perl objects serialized directly to socket using Storable) and better support for starting and controlling cluster (which used to be shell script).

Update: Sack now has proper home page at Ohloh and even playlist on YouTube (which doesn't really like my Theora encoded videos and doesn't have rss feed natively).

Following video shows improvements in version 0.11 on 22 node cloud hopefully better than video above.