Recently in hack-of-the-week Category

NSND2011Moravicewifi.png

We had another gathering called Nothing will happen in Moravice, and this time we wanted to be prepared for it. So we got 2 WRT and 2 TP-Link devices with good faith that we will have stable Internet there. However, our telecom provider decided to screw up our order to increase bandwidth from 4Mbit/s to 10Mbit/s for event and instead decided to downgrade our access to 512Kbit/s. So we opted to create following simple network architecture which involved multiple hops to alternative 4Mbit/s upstream and batman-adv mesh setup.

batman-adv is quite cool layer-2 mash network which operates in ad-hoc mode and allows adaptive routing over mash and multiple upstream providers (at DHCP request or renew time, so we made our DHCP lease time down to 5 minutes). Joining network is quite simple:

iwconfig wlan0 mode ad-hoc essid nsnd-batman
ifconfig wlan0 mtu 1528
modprobe batman_adv
batctl if add wlan0

ifconfig wlan0 up
ifconfig bat0 up
dhclient bat0
This will bring any laptop with ad-hoc support (and not all of them have it), and relatively recent kernel (2.6.32 from Debian stable is a bit too old - batman-adv got included in 2.6.33 kernel) up and running on mash network.

My own experience with this experiment is very positive, mostly because we had around 5 clients in mash at any time, compared with overloaded WRTs which handled rest of 40 wifi enabled devices. But, I hope that during next event more people will upgrade to 2.6.33 or newer kernels, so we can have even better mash connectivity.

Sometimes, you need to connect two networks in some way. My usual motivation is ability to access machines behind multiple NATs for easy system administration. So far, I used combination of OpenVPN and DynamicForward in ssh with clever use of ProxyCommand and nc with a sprinkle of proxy.pac for Firefox to make everything seemingly work. However, I never successfully managed to tunnel various JavaWebStart based remote consoles which want to connect directly from your machine to remote IP using sockets (for which you have to disable all proxy settings using jcontrol and selecting direct connection).

So this got me thinking. I could configure another OpenVPN for this, but it has many steps and I was lazy. Wouldn't it be great if there is some kind of P2P network like Skype or Hamachi for Linux? Something like this:

n2n_network.png

n2n: a Layer Two Peer-to-Peer VPN is exactly what I was looking for. It allows you to construct IP network over nodes behind NAT. But is it really easier to configure for specific example of accessing private network on another LAN behind NAT? Let's find out.

Steps are simple:

  • Install n2n (you will have to do this on supernode and two nodes)
    all$ sudo apt-get install n2n
    
  • Start super node on public address with DNS name super.example.com
    internet$ supernode -l 1234
    
  • Start first client
    local# edge -c community -d community  -k secret \
      -l super.example.com:1234 -a 10.1.2.1
    
  • Start remote end-point somewhere within LAN
    remote# edge -c community -d community  -k secret \
      -l super.example.com:1234 -a 10.1.2.2 -r
    
    Note changed IP address and -r flag which will allow us to route over this node.
    remote# sysctl -w net.ipv4.ip_forward=1
    remote# iptables -t nat -A POSTROUTING -s 10.1.2.1 -o tun0 -j MASQUERADE
    
    This will turn forwarding and NAT for our packets coming from community tap interface and going out through tun0 to LAN. We also need to setup route on local side for remote LAN network:
    local# ip route add 172.18.0.0/16 via 10.1.2.2
    
And we are done. In just 6 commands we routed remote LAN 172.18.0.0/16 over our 10.1.2.0/24 n2n interface to our local machine. And you don't have to stop at that. By installing additional edge in some other local network, you can get instant connectivity to your internal administrative network. This is very useful if you want to access your private repositories on local machine or need to open arbitrary sockets between machines.

Last few weeks, I was configuring huge ZFS pool of 50 disks over three machines. Aside from benchmarking, I wanted to setup monitoring of this disk pool. smartctl comes as natural candidate for getting smart data, but where should I keep it? I recently learned of git log -p output format which shows nicely changes in your source files, so natural question was can I use git to track smart disk statistics?

As it turns out, getting overview of disk layout is really easy under Linux if you know where to look. /proc/partitions first comes to mind, but it lacks one really important peace of information: disk serial number. It's only peace of information which won't change between reboots when you have to spin up 30+ disks, so you really want to use it as identification for disks, instead of device name for example (which I tried on first try and learned that disks move around).

Good naming of dump files is as important as always. In the end, I opted to use smart.id where id part is from /dev/disk/by-id/scsi-something. Paths in /dev/disk/by-id/ are essential useful when creating storage pools because they also don't change between reboots.

Now that we know where to look for disk identification and serial number, we are ready to start collecting smart data. However, this data is much more useful if coupled with info from controllers, so final version of smart-dump.sh script also supports dumping of controller status for LSI Logic / Symbios Logic and 3ware controllers. Have in mind that collecting smart info from disks does interrupt data transfers, so if you have huge pool you might want to spread those requests (or even issue them in parallel if you want one huge interruption as opposed to several smaller ones).

So was all this worth an effort? In fact, it was! In our sample of 50 3T disks, one disk reported errors after just 192 hours of lifetime. It would probably report it earlier, but this was second time that I run smartctl -t long on it. On the other side, it passed long check on first test which was 8 hours of LifeTime. Even if you read Failure Trends in a Large Disk Drive Population paper from Google, and concluded that smart is lying to you and you could ignore it, please monitor your drives!

I have been using ZFS on Linux for some time to provide backup appliance using zfs-fuse. Since then, we got native ZFS implementation on Linux, so I decided to move by backup pool from zfs-fuse to in-kernel ZFS.

Additional reason to move pool over to new machine was to change pool's RAID level. In current ZFS implementation(s) you can't change mirror to RAIDZ1 without re-creating pool and then transfering data over using zfs send and zfs receive. However, when you are creating snapshots for years, and expiring them using script you will have hundreds of snapshots which you need to transfer.

This is where zfs-pool-replicate.pl script comes handy. It uses Net::OpenSSH to connect to two machines (source and destination), list all snapshots on source and transfer them to destination. If you have filesystem without snapshots it will create one @send snapshot which will be transferred. It will also optionally use compression for transfer of snapshot over the network. I am using LZO which is fast compression which nicely transfers 150Mb/s or more over normal 1Gbit/s network without much CPU overheard (and we all have multi-core machines anyway, right?). Current implementation allows you to re-run replication script to transfer only new snapshots creating handy disaster recovery solution.

Current implementation is designed to run from third (management) machine, so I can envision central storage administration tool which will also allow you to transfer LVM snapshots into ZFS snapshots. For now, I'm using shell script for that, but rewriting it in perl would improve error recovery and reporting.

By default, MySQL installation on Debian comes without innodb_file_per_table option which spread tables in individual InnoDB files. Based on your usage patterns or backup strategies this might be better filesystem organization than one big /var/lib/mysql/ibdata1 file. I first heard about it in OurSQL Episode 36: It's Not Our (De)fault!. It's great podcast, but to be honest with each new episode I wish to have only PostgreSQL servers to maintain...

To enable this option you will need to create configuration file and restart MySQL server:

koha:/etc/mysql/conf.d# cat > file-per-table.cnf 
[mysqld]
innodb_file_per_table
CTRL+D
koha:/etc/mysql/conf.d# /etc/init.d/mysql restart

This won't change anything, because only new tables will be created in separate files. But, we can use ALTER TABLE table ENGINE=InnoDB on each table to force InnoDB to re-read tables and create separate files:

mysqlshow koha --status | grep InnoDB | cut -d'|' -f2 | sed -e 's/^/alter table/' -e 's/$/ engine=InnoDB;/' | mysql -v koha

If you replace grep InnoDB with grep MyISAM you might use same snippet to convert MyISAM tables into InnoDB (if you still have them or don't use fulltext search).

I think that system administration is like gardening. I don't know anything about gardening, but it seems to involve a lot of care here and there, seemingly without much pattern. In that sense, it's similar to wiki editing, you start somewhere and you really don't know where it lead you to.

You have to start reading by singing lady Ga-Ga with words: S, s, s, ss... SAML, SMAL2! It will help, really.

SAML 2 is latest in long line of different SSO implementation you will have to do sooner or later if you want to be part of larger web. Google and others seems to be using it, so it must be good, right?

It has two end-points: identity provider (IdP) which has user accounts and Service Provider (SP) which is usually your application. But of course, it's more complicated than that. For a start, you will need https on your host. I will assume that you already have domain, and you can get free SSL certificates at StartSSL so hop over there if you need one.

First, install SimpleSAMLphp. It's pimpliest possible way to get working SAML2 implementation of IdP and SP. You will want to follow first simpleSAMLphp Installation and Configuration and then SimpleSAMLphp Identity Provider QuickStart to configure simple IdP with static accounts so you can test your application against it. You will need both IdP and SP under your control to do development. It will also help if your remote IdP (identity provider which you intend to use) is also simpleSAMLphp (as AAI@EduHr is).

Installation is rather easy:

dpavlin@lib:/srv$ sudo apt-get install install memcached php5-memcache

dpavlin@lib:/srv$ wget http://simplesamlphp.googlecode.com/files/simplesamlphp-1.8.0.tar.gz

dpavlin@lib:/srv$ tar xf simplesamlphp-1.8.0.tar.gz
dpavlin@lib:/srv$ cd simplesamlphp-1.8.0/
dpavlin@lib:/srv/simplesamlphp-1.8.0$ cp config-templates/* config/
dpavlin@lib:/srv/simplesamlphp-1.8.0$ vi config/config.php
You will want to edit following options:
  • auth.adminpassword
  • secretsalt
  • enable.authmemcookie
dpavlin@lib:/srv/simplesamlphp-1.8.0$ php5 -l config/config.php 
No syntax errors detected in config/config.php
Interesting part here is authmemcookie option. This allows us to use SP side of simpleSAMLphp and store resulting authentication in memcache and send browser a cookie which we can later read and acquire data from memcache about current user.

To configure Apache side, you need Auth MemCookie but it isn't available in Debian package, so I opted for Apache::Auth::AuthMemCookie so I can flexibly modify IdP response before passing it on as environment variables.

dpavlin@lib:~$ cat /etc/apache2/conf.d/authmemcookie.conf 
Alias /simplesaml /srv/simplesamlphp-1.8.0/www
perlModule Apache::Auth::AuthMemCookie
<Location /cgi-bin>
        # get redirected here when not authorised
        ErrorDocument 401 "/simplesaml/authmemcookie.php"
        PerlAuthenHandler Apache::Auth::AuthMemCookie::authen_handler
        PerlSetVar AuthMemCookie "AuthMemCookie"
        PerlSetVar AuthMemServers "127.0.0.1:11211"
        PerlSetVar AuthMemDebug 1
        PerlSetVar AuthMemAttrsInHeaders 0
        AuthType Cookie
        AuthName "Koha SAML"
        Require valid-user
</Location>

To test it, easiest method is to create account in Feide OpenIdP and test against it. After all, it easiest to start with same implementation of SAML2 on both sides just to prevent following scenario:

On perl side I first tried Net::SAML2 and found out that it doesn't handle IdP without adding HTTP-Artifact support to the IdP. However, even after that I wasn't managed to make it work with simpleSAMLphp IdP implementation mostly because of my inpatience with SSL configuration of it.

On the bright side, for first test I didn't need to modify Koha (Library management software which I'm configuration SAML2 for) at all because it already has support for HTTP authorization.

We just got our conference booklet, and we need to publish it on the web. But, it has 152 Mb. This seemed excessive, so I googled a bit and found following:

$ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf

$ ls -al input.pdf output.pdf 
-r-xr-xr-x 1 dpavlin dpavlin 158511430 May 11 16:23 input.pdf
-rw-r--r-- 1 dpavlin dpavlin   1646309 May 11 16:24 output.pdf
That's reduction to 1% of original size.

-dPDFSETTINGS=configuration to presets the "distiller parameters" to one of four predefined settings:

  • /screen selects low-resolution output similar to the Acrobat Distiller "Screen Optimized" setting.
  • /ebook selects medium-resolution output similar to the Acrobat Distiller "eBook" setting.
  • /printer selects output similar to the Acrobat Distiller "Print Optimized" setting.
  • /prepress selects output similar to Acrobat Distiller "Prepress Optimized" setting.
  • /default selects output intended to be useful across a wide variety of uses, possibly at the expense of a larger output file.
akred1-85.60x53.98-var.png

I had a interesting problem with conference name tags this week. I wanted to use free software stack to produce them, and it turned out that this required patching bash to properly parse CSV files and learning new pdf tricks to print multiple pdf pages on single sheet of paper, so I will try to document them here, so I won't have to discover it again in two years for next one...

For design, we decided to use Inkscape, great vector drawing program. And even better, it already included Inkscape Generator extension just for that purpose.

For design, we decided to use ISO 7810 ID-1 card size of 85.60 × 53.98 mm and included template variables required by extension as you can see on picture.

Data for conference participants ended up in Gnumeric, and where exported in CSV file. And that's where we hit the first road-block. Current version of ink-generate extension doesn't support more than one comma inside quoted field. However, Inkscape Generator extension home page included useful pointer to correct bash code to parse CSV files by Chris F.A. Johnson so I decided to import it into ink-generator git repository and replace CSV parser. Few patches later and I had working extension which produces 600+ pdf files on disk.

In the process, I learned that you can invoke Inkscape extensions from command line, which is nice for generating previews while you edit code:

./generator.sh --var-type=name --data-file=test.csv --format=pdf --dpi=90 --output='$HOME/generator-output/%VAR_id%.pdf' --extra-vars=" " --preview=true akred1-var.svg
If --preview=true is removed, it will generate all files without GUI interaction, which is nice for automation.
To make it sorted by last name, we created fake id column padded with zeros.

pdfnup-2x5.png

Now we had to print them. While some printer drivers have option to print multiple pages, the one we where using decided to center each name tag requiring too much cutting each side of each name tag manually. This was obviously non-efficient, and I knew about psnup utility for PostScript, but I didn't know that there is pdfnup which is part of PDFjam (and in Debian package). However, getting layout just right involved reading pdfpages package documentation, but this is what I ended up with:

pdfnup --suffix nup --nup '2x5' --paper a4paper --no-landscape --noautoscale true --frame true --outfile nup.pdf -- ~/generator-output/*.pdf

This produces layout which you can see on the picture, nicely centered in the middle of the page (this is why I included fake grain background to show centering).

In the end, it didn't really worked out. Parsing CSV correctly (and supporting quotes inside quoted values) is a hard task in bash, and I had to admit that I don't really know how to fix it. With only a day to start of conference and no time to waste, I took my favorite language of choice, perl and wrote 60-line script which does same thing but uses Text::CSV perl module to parse data files.

There is something to be learned here: for a start language and good supporting libraries does matter. Second, sometimes you are better off starting from scratch. But that decision should be made only when you exhorted other options since fixing original extension would have benefit for wider community. There is a balance between scratching my own itch and common good which is tricky.

I had an interesting problem at my hand today: a directory with Informix dump in UNL format from which I had to extract data for migration into new non-relational system (into MARC format and not into NoSQL, btw). Idea was simple: let's import dump back into relational database, write SQL queries which produce data and use that. However, SQL standard doesn't really allow us to relax and expect everything to work. In fact...

Step 1: import into PostgreSQL

First idea was to use my favorite database, PostgreSQL and import data into it. First problem was schema file which used DATETIME HOUR TO MINUTE which i decided to convert into TEXT. There was another column with only date, so I will have to mungle this using SQL anyway.

But then I hit several roadblocks:
ERROR:  insert or update on table "xxx" violates foreign key constraint "xxx_fkey"
ERROR:  invalid input syntax for integer: ""
ERROR:  invalid input syntax for type date: ""
ERROR:  value too long for type character(xx)
ERROR:  date/time field value out of range: "13.02.1997"
They are all somewhat worrying for system which maintains your data, but I couldn't really influence quality of data in dump files from Informix, so I decided to try something which is more relaxed with errors like this...

Step 2: relax import using MySQL

Well, most of invalid input syntax should be ignored by MySQL, however:

ERROR 1074 (42000) at line 55: Column length too big for column 'xxx' (max = 255); use BLOB or TEXT instead
was a show stopper. I really don't want to hand-tune schema just to create throw-away queries to export data.

Step 3: SQLite - it takes anything!

In the process, I learned that I can't really blindingly import data, and that format has backslash on end of line for multi-line values, so I decided to write a small perl script which will import Informix UNL dumps directly into SQLite.

I'm generating INSERT INTO table VALUES (...) SQL directly, so you could easily modify this to run on some other database or just produce SQL dumps. For speed of import, I'm creating temporary database in /dev/shm. This helps sqlite3 to be CPU bound as opposed to disk-bound for import operation, and whole database is just 50Mb (UML dumps are 44M so it's very reasonable).

Not bad for less then 100 lines of perl code: working Informix UML loader into SQLite!

Last few days I spent migrating our EPrints repository to version 3. This seems like a dull and easy job, but it did have it's own challenges:

  • mysql database had latin1 encoding which didn't play well with utf-8 encoded characters from EPrints 2, in effect producing utf-8 strings which where encoded multiple times (and different for different data)
  • we also had table with additional works collected after our EPrints 2 installation died, so it had to be imported somehow

In a essence, EPrints set of perl scripts which convert XML archive into database and web interface. So, how hard can it be?

For a start, take a look at utf8-fix.pl script which will try to convert all combination of croatian characters back to utf-8. Creating mapping was not easy. And if you look at the end, you will see that script has verification step at the end which tries to find uncovered utf-8 strings and dump them out. To make it work, I used test-driven methodology (sic!) with fix.sh as small runner script which will do one conversion, show diff from last one (removed lines from log with errors is good) and open vi to edit files directly.

Re-read last sentence once more. I spend two days before I streamlined this workflow up to point where I could really finish conversation, so it's useful to have that in mind if you are writing some kind of data mungling software.

In the process, I also stripped croatian characters from pdf filenames, creating symlinks to unaccented versions and passing generated xml through unaccent-file.pl from fix.sh.

Second part of problem was converting tab delimited file into EPrints XML for import of new documents. However, it's (again) not as easy as it seems, since data had only partial filename which had to be matched to real files on share somewhere. So, I decided to split this problem in following way:

  • files.txt is list of available files generated by find /mnt/share -print
  • ep-xml.xml is template for single document which uses <!-- "variable" --> to denote places in which I need to insert custom data
  • finally, tsv2xp-xml.pl (which should really be named tsv2ep-xml.pl but I made typo) is script which reads both files (together with TSV) and emits XML for eprints

Finally, there is script eprints-ffzg-rebuild.sh which does import into EPrints (and create admin user :-)

About this Archive

This page is an archive of recent entries in the hack-of-the-week category.

conferences is the previous category.

howto is the next category.

Find recent content on the main index or look in the archives to find all content.

Pages

  • pics
OpenID accepted here Learn more about OpenID
Powered by Movable Type 5.04