Results tagged “perl”

Last year, I had good fortune to get acquiented with great work which Open Library does. It's part of Internet Archive which itself is a library. So, libraries are not (yet) dead it seems. Brewster Kahle's Long Now Talk explains it much better than I can do, so take 90 minutes to listen to it.

Most interesting part of Open Library (IMHO) is Internet Archive BookReader which is JavaScript application which allows users to browse scanned books on-line. For quite some time, I wanted to install something similar to provide web access to our collection of scanned documents. I have found instructions for serving IA like books from own cluster, but I didn't have a cluster, and converting all documents to IA book format seemed like an overhead which I would like to avoid.

Instead, I decided to write image server for JavaScript front-end using plack. I mean, it's basically a directory with images, right? Oh, how wrong can I be? :-)

It turs out that we have pictures in multiple formats (so sorting them required removing common prefix and using number only to get correct order), and most of are scanned images in pdf documents. Here are all types of documents which can be automatically collected into book for on-line browsing:

  • images of scanned pages
  • multi-file pdf file with single image per page
  • single pdf file with one image for each page
  • single pdf file with more than one (usually 4) horizontal bitmap strips for each page
  • normal pdf documents which contain text and needs rendering to bitmap

Source code of my plack server for Internet Archive book reader is on github, so if you want to take a look, hop over there...

I have been using ZFS on Linux for some time to provide backup appliance using zfs-fuse. Since then, we got native ZFS implementation on Linux, so I decided to move by backup pool from zfs-fuse to in-kernel ZFS.

Additional reason to move pool over to new machine was to change pool's RAID level. In current ZFS implementation(s) you can't change mirror to RAIDZ1 without re-creating pool and then transfering data over using zfs send and zfs receive. However, when you are creating snapshots for years, and expiring them using script you will have hundreds of snapshots which you need to transfer.

This is where zfs-pool-replicate.pl script comes handy. It uses Net::OpenSSH to connect to two machines (source and destination), list all snapshots on source and transfer them to destination. If you have filesystem without snapshots it will create one @send snapshot which will be transferred. It will also optionally use compression for transfer of snapshot over the network. I am using LZO which is fast compression which nicely transfers 150Mb/s or more over normal 1Gbit/s network without much CPU overheard (and we all have multi-core machines anyway, right?). Current implementation allows you to re-run replication script to transfer only new snapshots creating handy disaster recovery solution.

Current implementation is designed to run from third (management) machine, so I can envision central storage administration tool which will also allow you to transfer LVM snapshots into ZFS snapshots. For now, I'm using shell script for that, but rewriting it in perl would improve error recovery and reporting.

Allmost two years ago, I began my experiments with RFID, writing support for 3M 810 RFID Reader. Then I tried to make web interface for RFID it in Koha.

Comet: ETOOMUCH This woked, but having separate Comet server was too much complexity for me, so I decided to implement JSONP directly in RFID driver code. Hack, I allready had HTTP server, and local JavaScript interface, so why not?

Move to JSONP allowed me to use original Koha web interface, and just overlay RFID information and form submission code as simple jQuery snippet.

But, two years of development and trying out different approaches produced not-quite-production-quality code. So, I began rewrite called Biblio::RFID. It splits RFID reader support from HTTP and JSONP servers and couples this with documentation and tests. I have production use for it this summer, involving programming of RFID cards as they are printed out, so expect it to change during next few weeks. After that I will push it to CPAN, but I would love to get feedback and comments before that.

Update: It's also availale at CPAN.

I have spent few last weeks with my head down, adding persistence and changes tracking to Mojo Facets, turning it into much more interesting alternative to web-based data stores like DabbleDB. Idea was simple: I had all data in memory, I should be able to edit it, right?

Well, as it always turns out, if was about three weeks of development, but it moved Mojo Facets into much more interesting use case of making small tweaks to your input data.
Problem is how to keep those changes? Mojo Facets is never master provider for data so saving some kind of audit log which can be applied back on master data is of paramount importance. After all, if you edit that data, you might want to apply those changes back when you re-generate source file or pull new version from some other system.

First idea was to add simple audit log which records all requests in Mojo. I decided to call requests with parameters actions and store them on disk under /tmp. All I had to add was re-submit form in browser and a bit of interface around it. Same form with all parameters can turn Mojo Facets into peer-to-peer application: I just added checkbox which can change destination URL in action to another Mojo Facets installation and I got nice replication of actions to another instance.

But, all was not well. Editing data in browser generates update to specific entry in your dataset, so I decided also to record changes which include old and new field value, and all unique keys for this dataset.

This seems like such a small statement, but getting it up to point where you can load some data, edit it in browser and than apply that changes back on original data (after reboot) or on different dataset with same unique field.
Even better, it should be possible to apply changes log to master data. I prefer to think of it as a replication log to another system.

To integrate better with other systems, filters got export (and import) option which dumps them in simple, one line per entry text file which is accessible over http. It's perfect format it you want to quickly xargs that data into another script, for example to generate more source data with something as simple as:

 cat /srv/mojo_facets/public/export/isi-cro2.js/filter.autor.119 | \
 xargs -i ./bin/isi-download-results.pl 'CA={}'

Speaking of more dataset sources, I also added parsing on html tables, which should allow users at some point to just drop multiple tables from results page into single directory and load them as dataset. Your telebanking doesn't have export you need? No problem! Just save all pages to disk and you are ready to do.
Right now, table parsing needs a bit of heuristics to be really useful. It searches for table on page with correct number of columns, and has support for extracting of header or repeating first row (normal <td>) for column names.

All that would be unusable without profiling to turn it really snappy. This is first time I used Google Chrome for serious development, and while I still dislike it's inspector (firebug's dom inspector is much nicer to me), Speed Tracer extension was very useful for front-end part including network throuput and html/javascript overhead. On server side, I used Devel::NYTProf, and I will talk about it at Slobodni Fastival 4 in Čakovec, so see you there...

My server side faceted browser just got a bit better. In fact, it become 10 times better. But, let's try to explain this story step by step...

This week I will try to introduce faceted joins. Primary motivation is great Plants For A Future database which consists of more than one text file.

Use case is something like following:
I would like to know all plants which can have medical use, are edable and have perennial habitat (so I don't have to re-plant them every year).

And you can watch the video to see how easily this can be done:

But, this still doesn't make MojoFacets 10 times better than before. This is quite small dataset (still about 10 times bigger than Exhibit could handle), but I had new problem: 100Mb source file a bit less than 30000 items. To make it scale more I implemented pre-calculated filters and sorts. They serve same usage as indexes do in relational databases, but they are calculated on demand and stored in memory.

Let's see in action how does it work with ~30000 items:

In this video, we saw:

  • starting memory usage of ~13Mb
  • 100Mb dataset with 29869 items
  • filter by autor with 45644 taking ~10s
  • use regex filter ic,
  • godina_izdavanja is numeric facet
  • jezik filter using cro slo ser regexps and toggle it
  • show popup title on filters
  • turn off filters to show under 4s load time
  • at the end, we consumed ~260Mb of memory
Ok, 4s might not seem blazingly fast, but have in mind that all this is implemented in pure perl (so deployment is lightweight) using Mojolicious web framework. But it has it's overhead. Other than 260Mb or RAM for browser, it will also take 600Mb of RAM memory for server side. But, if you can live with 6* file size factor server side this might be very interesting as a faceted browsing tool for the web.

I am huge fan of Exhibit faceted browsing of data. However, Exhibit is implemented in JavaScript within your browser and that makes it unusable for larger amounts of data (more than 300 or so). In my case, 3800 elements is unusably slow even in latest Chrome or Firefox.

Something had to be done. If JavaScript inside browser isn't up to the task, you might wonder what would happen if you moved processing back to server side, and use browser just for task which they are good at: displaying generated HTML pages.

So, let me introduce MojoFacets - server-side facet browser based on Mojolicious with a bit of jQuery UI.

Although it seems strange to promote server-side solutions in 2010, this approach still makes sense. For just 40Mb of memory usage server side (including Mojo and dataset) you can get fast and usable facet browsing.

It's about system, stupid!

lib-architecture-v2.png When you are working as system architect or systems librarian, your job is to design systems. My initial idea was to create small Google out of 12 machines which are dedicated to be web kiosks. I decided to strictly follow loosely coupled principle, mostly to provide horizontal scaling for my data processing needs. I wanted to be able to add machine or two if my query is too slow... This easily translates into "now long will I have to wait for my page to generate results"....

I decided to split my system into three logical parts: network booting, data store, and quick reporting. So, let's take a look at each component separately:

  • PXElator
    • supported protocols: bootp, dhcp, tftp, http, amt, wol, syslog
    • boot kiosks using Webconverger (Debian Live based kiosk distribution)
    • provides web user interface for overview of network segment for audit
    • configuration is stored as files on disk, suitable for management with git or other source control management
  • MongoDB
    • NoSQL storage component which support ad-hoc queries, indexes and other goodies
    • simple store for perl hashes from PXElator generated every time we see network packet from one of clients using one of supported protocols
  • Sack
    • fastest possible way to execute snippet of perl code over multiple machines
    • this involves sharing information to nodes, executing code on all of them and collecting results back, all in sub 3 second mark!
    • web user interface for cloud overview and graph generation using gnuplot

When I started implementing this system last summer, I decided to use CouchDB for storage layer. This wasn't really good choice, since I didn't need transactions, MVCC or replication. Hack, I even implemented forking for document stored in CouchDB to provide faster response to clients in PXElator.

Moving to much faster MongoDB I got ad-hoc queries which are usable (as in I can wait for them to finish) and if that's too slow, I can move data to Sack and query it directly from memory. As a happy side effect, making shards from MongoDB is much faster than using CouchDB bulk HTTP API, and it will allow me to feed shards directly from MongoDB to Sack nodes, without first creating shards on disk.

I'm quite happy how it all turned out. I can configure any host using small snippet of perl code in PXElator, issue ad-hoc queries on audit data on it in MongoDB or move data to Sack if I want to do data munging using perl.

As you noticed by now, I'm using live distribution for kiosks, and machines do have hard drivers in them. Idea was to use those disks as storage with something like Sheepdog. seems like perfect fit. With it in place, I will have real distributed, building size computer :-).

You have your new shiny application, and LDAP server on the other side. Easy as pie. What can go wrong?

  • you use e-mail as login, and application assumes that logins don't have domain in them and allows you embedding of whole login into DN
  • application can import various interesting fields using LDAP, but you have data somewhere else, and it doesn't really belong into your LDAP
  • you need to provide subset of data in your database as LDAP server to application

I had written about my saga with LDAP about augmenting LDAP search responses and exposing RDBMS data as LDAP server. But today, I added rewrite of bind, so now I can use unmodified Koha, and all needed quirks for my AAI@EduHr schema are outside application.

This made my attempt to virtualize LDAP almost complete so I created a project page on Ohloh. I will write small updates about status there, so If any of this is interesting to you, hop over there.

Last weekend we had Nothing will happen event and I did small presentation about HTML5TV which is:

  • video annotation interface based on mplayer with slide overview html5tv-editing.png
  • web interface using HTML5 <video> tag and some JavaScript html5tv-web.png
I have collected some of presentations at http://html5tv.rot13.org/, so hop over there and have a look. If you have more interesting presentations, you can easily fork HTML5TV at github, annotate presentation and submit it for inclusion (or make your own archive if you prefer that). Have fun and let me know what you think about it.

sack-onion-logo.png Main design goal is to have interactive environment to query perl hashes which are bigger than memory on single machine.

Implementation uses TCP sockets (over ssh if needed) between perl processes. This allows horizontal scalability both on multi-core machines as well as across the network to additional machines.

Reading data into hash is done using any perl module which returns perl hash and supports offset and limit to select just subset of data (this is required to create disjunctive shards). Parsing of source file is done on master node (called lorry) which then splits it to shards and send data to sack nodes.

Views are small perl snippets which are called for each record on each shard with $rec. Views create data in $out hash which is automatically merged on master node.

You can influence default shard merge by adding + (plus sign) in name of your key to indicate that key => value pairs below should have values summed when combining shards on master node.

If view operation generate huge amount of long field names, you might run out of memory on master node when merging results. Solution is to add # to name of key which will turn key names into integers which use less memory.

So, how does it look? Below is small video showing 121887 records spread over 18 cores on 9 machines running first few short views, and than largest one on this dataset.

If your browser doesn't have support for <video> tag, watch Sack video on YouTube or using ttyrec player written in JavaScript.

Source code for Sack is available in my subversion and this is currently second iteration which brings much simpler network protocol (based only on perl objects serialized directly to socket using Storable) and better support for starting and controlling cluster (which used to be shell script).

Update: Sack now has proper home page at Ohloh and even playlist on YouTube (which doesn't really like my Theora encoded videos and doesn't have rss feed natively).

Following video shows improvements in version 0.11 on 22 node cloud hopefully better than video above.

Few days ago, I was writing wiki page with user management description. In it, I wanted to have current data about users. Since I'm using wiki which has RSS feed include, I decided to hack on SQL2XLS until it produced RSS feed which I can include within page.

Usage is very similar to SQL2XSL: you just create a bunch of SQL files, and each of them will create one item in RSS feed. Items links in RSS feed will return just that SQL query.

If you need something similar, and have CGI enabled server, get sql2rss.cgi, create SQL files and you are ready to go...

Roughly six months ago, I wrote first version of Redis perl bindings. For some strange reason, I didn't push it to CPAN at the same time, but recent discussion on Redis list (and two pending e-mail about it) forced me to finally make a push.

And it was worth it! Just a day later we have KiokuDB::Backend::Redis. Did I mentioned it was just one day later? Remember: release often, release early, but also release at right place :-)

PXElator introduction

This weekend we where in Split on Ništa se neće dogoditi event and I did presetation about first three weeks of PXElator development which can be used as gentle introduction into this project. So, here we go...

Introduction

PXElator is just a peace of puzzle which aims to replace system administration with nice declarative programs in perl. It's a experiment in replacing my work with reusable perl snippets.

It tries to solve following problems:

  • support deployment of new physical or virtual machines (ip, hostname, common configuration)

  • maintain documentation about changes on systems, good enough to be used for disaster recovery (or deployment of similar system)

  • configure systems in small chunks (virtual or containers) for better management and resource tracking using normal system administration tools (but track those changes)

  • provide overview and monitoring of network segment and services on it with alerting and trending

Deployment of new machines

What is really machine? For PXElator, it's MAC and IP address and some optional parameters (like hostname). It's stored on file-system, under conf/server.ip/machine.ip/hostname and can be tracked using source control if needed.

This is also shared state between all daemons implementing network protocols:

  • DHCP (with PXE support)

  • TFTP (to deliver initial kernel and initrd using pxelinux)

  • HTTP (to provide alternative way to fetch files and user interface)

  • DNS (we already have data)

  • syslog

  • AMT for remote management

Having all that protocols written in same language enables incredible flexibility in automatic configuration. I can issue command using installation which has only ping because I can have special DNS names which issue commands.

But, to get real power, we need to aggregate that data. I'm currently using CouchDB from http://couchdb.apache.org/ to store all audit data from all services into single database.

I wanted simple way to write ad-hoc queries without warring about data structure too much. At the end, I opted for audit role of data, and used 1 second granularity as key when storing data. Result of it is that 133 syslog messages from kernel right after boot you will create single document with 133 revisions instead of flooding your database.

It would be logical to plug RRDtool http://oss.oetiker.ch/rrdtool/ somewhere here to provide nice graphs here, but that is still on TODO list.

End user scenarios:

  • Take a new machine, plug it into network, boot it from network and configure for kiosk style deployment with Webconverger available at http://webconverger.com/. Kiosk should automatically turn on every morning at 7:30 and turn off at 20:30.

  • Boot virtual machine (with new ip and hostname) from backup snapshot for easy recovery or testing

  • Boot machine from network into fully configurable (writable) system for quick recovery or dedicated machine. This is implemented using NFS server with aufs read-write overlay on top of debootstrap base machine.

Disaster recovery documentation for me, two years later

I have been trying to write useful documentation snippets for years. My best effort so far is Sysadmin Cookbook at https://sysadmin-cookbook.rot13.org/ a set of semi-structured shell scripts which can be executed directly on machines.

This part isn't yet integrated into PXElator, but most of the recipe will become some kind of rule which you can enforce on some managed machine.

End user scenario:

  • Install that something also on this other machine

Configure system like you normally would but track changes

This is basically requirement to track configuration changes. Currently, this feature falls out of writable snapshot over base system which is read-only. Overlay data is all custom configuration that I did!

Tracking changes on existing machines will be implemented scp to copy file on server into hostname/path/to/local/file directory structure. This structure will be tracked using source control (probably git as opposed to subversion which PXElator source uses) and cron job will pull those files at some interval (daily, hourly) to create rsync+git equivalent of BackupPC http://backuppc.sourceforge.net for this setup.

It's interesting to take a look how it's different from Puppet and similar to cfengine3:

  • All data is kept in normal configuration files on system -- you don't need to learn new administration tools or somehow maintain two sources of configuration (in configuration management and on the system)

  • Introspect live system and just tries to apply corrections if needed which is similar to cfengine3 approach.

End user scenario:

  • Turn useful how-to into workable configuration without much effort

Provide overview and monitoring

This falls out from HTTP interface and from collecting of data into CouchDB. For now, PXElator tries to manage development environment for you, opening xterms (with screen inside for logging and easy scrollback) in different colors, and enable you to start Wireshark on active network interfaces for debugging.

First, of all, happy sysadmin day 2009-07-31! So, it seems logical that I'm announcing by project PXElator which aims to replace me with a perl script. It's basically my take on cloud hype. Currently it supports bringing up machines (virtual or physical) from boot onwards. It implements bootp, dhcp, tftp and http server to enable single action boot of new machine.

It all started when I watched Practical Puppet: Systems Building Systems and decided that real power is in expressing system administration as code. I also liked DSL approach which Puppet took in ruby and tried to apply same principle (declarative DSL) in perl. If you take a look at source code it seems work work quite well.

In the spirit of release early, release often this code will be in flux until end of this summer, when I plan to deploy it to create web kiosk environment for browsing our library catalog based on it.

Last few weeks I have been struggling with memory usage on one of machines which run several OpenVZ containers. It was eating whole memory in just few days:

koha-hw-memory-week.png

I was always fond of graphing system counters, and since reboots are never a good thing something had to be done. One of first things that jumps out is that weekends are quiet, and don't generate 1Gb of additional memory usage. So it had something to do with out workload when library was open. But, what?

Even, worse, it all started only two weeks ago!

koha-hw-memory-month.png

Occasional look at ps axvv wasn't really something which is useful in debugging this problem, and I needed more precise information. So, I opted for simplest possible solution: record memory usage using vzps from crontab with following shell script:

#!/bin/sh

cd /srv/ps-trend 
dir=`date +%Y-%m-%d`
test -d $dir || mkdir $dir
COLUMNS=256 vzps -eo veid,pid,ppid,ni,user,rss,sz,vsz,time,stat,f,command > $dir/`date +%H%M`

After collecting several hours of trace, I decided to group them by container, and by all processes which have more than 64Mb of memory usage. Sometimes, it's amazing how solution jumps out by itself if you describe your problem good enough to computer (and draw a graph :-)

ps-hourly.png

After I identified that two Apache instances where eating memory like crazy, I remembered one of fellow sysadmins who complained about threaded Apache installation where some Apache child processes would mysteriously take 256Mb of RAM memory each. Just some, not all. Of course, I had several of them.

My solution to problem was also simple:

# sudo apt-get install apache2-mpm-prefork

It seems that threaded model in current Apache 2 just isn't good for me. Which is strange because application is basically a bunch of CGI scripts.

Result is not bad: 1Gb of additional free memory (which will be used for file-system cache). Not leaking anymore will also save us from hitting swap which was so bad that first reboot was needed. If nothing else, remember that tracing system counters and graphing them is always good investment in time, because pictures can tell different story than raw data. Especially if you have a lot of raw data (20k per minute in this case)

It would be nice to turn this into full monitoring solution. I really like idea of minimal client deployment for monitoring, so something like ps and curl to push data directly to graphing server, and triggered from crontab (with possibility to parse e-mail and insert them for later recovery from network interruptions) might just be a solution which I would recommend. Remember, Linux is operating system. It can do a lot of thing by itself :-)

If you just want light-weight graphs for your machines RRD::Simple Monitoring server has light-weight perl client (single perl script which submits to CGI script on server) which is triggered from cron. This project was inspiration to give RRD::Simple a try.

I'm not a big Facebook fan. In fact, I don't use it long enough to form any opinion about it other than objection that it's a silo from which I can't get my data back out. But, since most of my users in library are using it, I decided to take a look how hard would it be to expose part of our library catalog on Facebook.

Easiest thing to do was to make Facebook appliaction which would fetch RSS feed with results from Koha and present it inside Facebook.

Facebook applications (in canvas mode which I'm using) are in fact simple web pages with a bit of custom Facebook markup. My initial gripe about applications was that they where slow. Now that I know that they are running somewhere else and not on Facebook I understand why they are slow.

So, to sum it up: if you know how to make simple CGI script, you will be fine with Facebook applications. They will be even slower than your application on your server, and if you do something popular you might have problems with server load.

Internet is not a single network. Some parts of it are hidden behind firewalls, some services allows access only from specific range of IP addresses. To solve that, we are using proxy servers, but what do you do when you want to allow your users easy access to resources which are not directly accessible?

For a long time, I was fan of CGIProxy. Single CGI script which allows you to access all web resources which are visible from machine on which CGIProxy is installed. However, modern web pages have many, many elements, and soon enough overhead of CGI execution for each element proved to be too much for our users patience. It was slow...

I decided to take a look at mod_perl2 as solution since it provides long-living perl interpreter inside Apache 2 server. I was on the right track: Apache2::ModProxyPerlHtml provides easy to configure html rewriter using Apache 2 and mod_perl2. I tested it and immediately saw speedup comparing to previous CGIProxy based solution.

But, this was only half of problem. I also needed to solve user authorization somehow. With old system, we had LDAP server as login method, but this time, I needed to somehow check user passwords in Koha database which are base64 encoded md5 hash of password. Base64 is somewhat unfortunate choice because MySQL doesn't have built-in base64 encoding. If it did, I could just use Apache::AuthDBI, craft SQL queries and I would be ready to go.

First idea was to write Apache2 auth module which would connect to Koha directly. That would work, but it would also require secure connection between proxy and Koha (we are transfering passwords) and proxy would need to have credentials to access Koha database. None of that seemed very clean or secure, so I decided to split it into two parts:

  • Apache auth module which requests credential verification from Koha server over https
  • CGI script on Koha which verifies user and return status
With this approach, passwords are never traveling across network (and even md5 hash of password is transfered over ssl) and proxy server doesn't have to have any Koha specific configuration.

Here is small Apache authorization module which will transfer userid and base64 encoded password hash to cgi script on Koha server over https:

package Apache2::AuthKoha;
  
use strict;
use warnings;
  
use Apache2::Access ();
use Apache2::RequestUtil ();
  
use Apache2::Const -compile => qw(OK DECLINED HTTP_UNAUTHORIZED);

use Digest::MD5 qw/md5_base64/;
use LWP::Simple qw/get/;

sub handler {
        my $r = shift;
  
        my ($status, $password) = $r->get_basic_auth_pw;
        return $status unless $status == Apache2::Const::OK;

        return Apache2::Const::OK if get(
                'https://koha.example.com/koha-auth?userid=' . $r->user .
                ';password=' . md5_base64($password)
        );
 
        $r->note_basic_auth_failure;
        #return Apache2::Const::DECLINED; # allow other authentification
        return Apache2::Const::HTTP_UNAUTHORIZED;
}

1;
And this is small CGI script on Koha server's side which checks userid and password hash and returns appropriate status:
#!/usr/bin/perl

# ScriptAlias /koha-auth /srv/koha-auth/auth.cgi

use warnings;
use strict;

use CGI;
use DBI;

our $dsn      = 'DBI:mysql:dbname=koha';
our $user     = 'koha-database-user';
our $passwd   = 'koha-database-password';

my $q = CGI->new;

my $status = 200;

sub out {
        my ($status,$text) = @_;
        print $q->header( -status => $status ), "$text\r\n";
        exit;
}

out( 500, "NO PARAMS" ) unless $q->param;

my $dbh = DBI->connect($dsn, $user,$passwd, { RaiseError => 1, AutoCommit => 0 }) || die $DBI::errstr;

my $sth = $dbh->prepare(q{
        select 1 from borrowers where userid = ? and password = ?
});

my ( $userid, $password ) = ( $q->param('userid'), $q->param('password') );
$password =~ s{ }{+}g;

$sth->execute( $userid, $password );

if ( $sth->rows == 1 ) {
        out( 200, "OK" );
} else {
        out( 404, "ERROR" );
}
To complete this setup, we also have to define virtual host on proxy server which will tie together our components:
<VirtualHost *:443>
        SSLEngine on

        SSLCertificateFile    /etc/apache2/proxy.example.org.pem
        SSLCertificateKeyFile /etc/apache2/proxy.example.org.pem

        ServerName proxy.example.org

        ProxyRequests Off
        ProxyPreserveHost Off

        PerlInputFilterHandler Apache2::ModProxyPerlHtml
        PerlOutputFilterHandler Apache2::ModProxyPerlHtml
        SetHandler perl-script
        PerlSetVar ProxyHTMLVerbose "On"

        <Proxy *>
                Order deny,allow
                Allow from all
        </Proxy>

        PerlAuthenHandler Apache2::AuthenDBMCache Apache2::AuthKoha
        PerlSetVar AuthenDBMCache_File  /tmp/auth-cache
        PerlSetVar AuthenDBMCache_TTL   3600
        PerlSetVar AuthenDBMCache_Debug On


        ProxyPass /secure/ http://secure.example.com/
        <Location /secure/>
                ProxyPassReverse /
                PerlAddVar ProxyHTMLURLMap "/ /secure/"
                PerlAddVar ProxyHTMLURLMap "http://secure.example.com /secure"

                AuthName Proxy
                AuthType Basic
                require valid-user
        </Location>
</VirtualHost>
This will enable you to access http://proxy.example.org/secure/ and get access to http://secure.example.com/

You will also notice that I'm using Apache2::AuthenDBMCache to prevent proxy from checking user credential for every page element (which would be slow). At first, this setup didn't work well. I would get No access to /tmp/auth-cache at -e line 0 because client browser was opening multiple connection at same time and perl's dbmopen didn't like that. Fortunatly, it was easy to fix, so I just added use DB_File; in Apache2::AuthenDBMCache which forced dbmopen to use Berkeley DB (which allows multiple readers) instead of default GDBM.

Once again, perl proved to be duct tape of Internet. With a few lines of code and some configuration you can make wonderful things. So, why don't you? :-)

Last weeked, we had a monthly meeting of LangGang where I saw CouchDB's Futon interface running on top of git (and shell!)

This idea stuck with for a couple of days, but yesterday it just had to get out. So, I'm working on CouchDB HTTP API interface on top of Mojo.

My goal is a bit different, I really want to make Futon work on my Storable dumps on disk from WebPAC so I can get that HTTP interface which is described in architecture documentation at last...

If you want to know what's happening with CouchDB lately, subscribe to CouchDB Podcast which features latest news from CouchDB development.

I already blogged about RFID readers but now I really need to replace antique 3M software with something better. So, basically, I started with specification:

User should put RFID tag on reader and get output from Koha
Nice and simple. However, readers are connected to Windows machine. And we need some way for reader to push data about new tag to browser. Hmmm... push? Sound like Comet, doesn't it?

Let's see what we need:

  1. Comet server to display page in web browser and push updates
  2. RFID reader program installable on Windows to communicate with serial port and push messages to Comet server
  3. User with a browser
Again, seems somewhat reasonable. Can perl do it? It seems it can:
  1. Meteor server provides perl comet server which required one additional file to provide integration with Koha.
    I needed comet server to deliver Koha data because of single origin security in browsers which doesn't allow me to make AJAX requests to Koha directly. On the other hand, this gave me excuse to try out HTML::Query to extract part of page which I needed on server.
    I liked this split between Koha and Comet server (even on different boxes and/or network segments), but it came with a price: every page generated in Koha took more than 2 seconds, which was just too slow for nice interactive demo. So, I implemented on-disk caching which is also really nice for debugging. In production version, requests for Koha data might be implemented as forkers (since browser uses AJAX request for it, it makes sense) or I will issue queries directly over Koha's database.
  2. Next step was to add push from my perl RFID reader to Meteor.
    This also turned out to be simple: one socket connection to Meteor, and few revisions later I had more or less following protocol which reader could push to comet server as pipe | encoded string:
    meteor( 'info', "Found reader hardware $hw_ver" );
    meteor( 'info-none-in-range' );
    meteor( 'info-in-range', join(' ',@tags));
    meteor( 'in-range', $tag );
    meteor( 'read', $tag ); 
    meteor( 'removed', $tag ); 
    
  3. I also needed web interface in browser.
    When I need to do something quickly I often turn to jQuery as long time readers of my blog already know. One additional javascript file in Meteor's public_html directory and we have a working interface.
    At first, I created JavaScript object (like hash in perl :-) to keep track of visible tags on screen. But, since each tag is div which has id set to SID of tag, it was easier (and shorter) to just use jQuery to ask DOM if there is element on page. KISS wins again...

So how does it look, you might ask? (You might want to turn volume down, because AudioSwap feature of YouTube should be really considered harmful :-)

This might help solve question is perl dead. Only if it means dead easy. it's 4k of perl code, 4k of JavaScript and 4k of CSS.

Let's assume that we have two systems, one which support LDAP user accounts and other system, based on relational database (Koha, in this example) which has user information which you would like to expose using LDAP for first system.

We could export all data required into institution LDAP, but that would push a lot of junk which is really required only by one application. Worse yet, some of that data is somewhat sensitive because it include serial numbers (SID) and data from RFID cards.

But, since application supports LDAP and we have Net::LDAP::Server it should be easy! And it really is.

LDAP server accessing Koha database using DBI is really converter of LDAP search query into SQL where syntax which is than appended to initial select which returns attribute values for LDAP entry which will be returned for search request.

This small example will hopefully be useful to other people who would like to integrate two systems by exposing user data via LDAP protocol. Just remember that pushing real valid usable data back to LDAP always makes more sense if that data is useful for current or future systems...

PerlMonks has node with this annoucement, so you might want to comment there if it's in interest of general perl comunity