May 22, 2008

Group by data in shell pipes

My mind is just too accustomed to RDBMS engines to accept that I can't have GROUP BY in my shell pipes. So I wrote one groupby.pl.


Aside from fact that it somewhat looks like perl golfing (which I'm somewhat proud of), let's see how does it look:


dpavlin@llin:~/private/perl$ ps axv | ./groupby.pl 'sum:($6+$7+$8),10,count:10,min:($6+$7+$8),max:($6+$7+$8)' | sort -k1 -nr | head -10 | align
440947 /usr/lib/iceweasel/firefox-bin 1 440947 440947
390913 /usr/sbin/apache2 11 22207 39875
180943 /usr/bin/X 1 180943 180943
135279 /usr/bin/pinot-dbus-daemon 1 135279 135279
122254 mocp 2 25131 97123
84887 pinot 1 84887 84887
78279 postgres: 5 10723 21971
70030 /usr/bin/perl 6 6959 15615
50213 /bin/bash 7 6351 7343
49266 /usr/lib/postgresql/8.2/bin/postgres 2 24631 24635

This will display total usage for process, it's name, number of such processes and range of memory usage. We can then use old friend sum.pl to produce console graph, but I already wrote about it.


So, let's move to another example, this time for OpenVZ. Let's see how much memory is each virtual machine using (and get number of processes for free):



$ vzps -E axv --no-headers | ./groupby.pl 'sum:($7+$8+$9*1024),1,count:1'
2209504127 0 265
611768242 212024 38
162484775 212037 19
170797534 212052 38
104853258 212226 26
712007227 212253 21

But wouldn't it be nice to display hostnames instead of VEID numbers? We can, using --join and --on options (which are really backticks on steroids):

$ vzps -E axv --no-headers | ./groupby.pl 'sum:($7+$8+$9*1024),1,count:1' --join 'sudo vzlist -H -o veid,hostname' --on 2
2146263206 0 259
675835528 saturn.ffzg.hr 40
162484775 arh.rot13.org 19
170797534 koha-dev.rot13.org 38
104853258 koha.ffzg.hr 26
712011323 zemlja.ffzg.hr 21

Which brings us to final result:

$ vzps -E axv --no-headers | ./groupby.pl 'sum:($7+$8+$9*1024),1,count:1' --join 'sudo vzlist -H -o veid,hostname' --on 2 | sort -rn | align | ./sum.pl -h
0 260 2105M OOOOOOOOOOOOOOOOOOO 2105M
zemlja.ffzg.hr 21 679M OOOOOO------------------- 2784M
saturn.ffzg.hr 35 512M OOOO-------------------------- 3296M
koha-dev.rot13.org 38 162M O------------------------------ 3459M
arh.rot13.org 19 154M O-------------------------------- 3614M
koha.ffzg.hr 26 99M ---------------------------------- 3714M

So, here you have it: SQL like query language for your shell pipes.

May 15, 2008

Bag of useful scripts

Why does console refuse to die?
- It's because of pipes!
This post is result of my long addiction to console applications. Somehow, when I want to get a quick view of things on my system, I always turn to pipes and do something with them. In that process, I developed few of useful scripts for use within shell pipes, and I would like to introduce my readership to them.

PostgreSQL database size

When I want to see size of all databases on my system or size of tables in one database I turn to pg_size. It's a short and sweet script which will do a little shell magic (take a look in it) and display size of all databases on system (without any options) or size of all tables in database and number of rows (when used like pg_size database_name) like this:
dpavlin@llin:~$ pg_size dellstore2 | grep -v sql_
4890624 customers 20000
3153920 orderlines 60350
2678784 cust_hist 60350
991232 products 10000
966656 orders 12000
450560 inventory 10000
8192 categories 16
0 reorder 0
This is all nice and well, but doesn't really gives us the right overview, so move along for...

Nice console graphs

First, a caveat: this tools assumes that it will get number, space, and optional description. Output above seems to fit into this description, so let's try it:
dpavlin@llin:~$ COLUMNS=80 pg_size dellstore2 | grep -v sql_ | sum.pl -h
customers 20000    4776k OOOOOOOOOOOOOOOO                                 4776k
orderlines 60350   3080k OOOOOOOOOO-----------------                      7856k
cust_hist 60350    2616k OOOOOOOOO---------------------------               10M
products 10000      968k OOO-------------------------------------           11M
orders 12000        944k OOO----------------------------------------        12M
inventory 10000     440k O-------------------------------------------       12M
categories 16      8192b ---------------------------------------------      12M
reorder 0              0 ---------------------------------------------      12M
This gives us nice output: description (followed by number of rows from above output), and running total of size in human readable form (if you don't like it, remove -h flag and you will get raw numbers).

Let's take another example (if you are still reading this and not interested in PostgreSQL database size). Let's see how much traffic did pppd transfer over very slow GPRS link on 8-day vacation:

dpavlin@llin:~$ grep 'pppd.*: Sent' /var/log/messages | awk '{ print $7 + $10 " " $1 " " $2 }' | sum.pl -h
May 5         0                                                               0
May 5       39k                                                             39k
May 5     7512k OO                                                        7551k
May 6     6352b --                                                        7558k
May 6       20k --                                                        7579k
May 6     1183k ---                                                       8762k
May 8     6869k OO---                                                       15M
May 8       70k -----                                                       15M
May 9     3596k O------                                                     18M
May 9     1998k -------                                                     20M
May 10      32M OOOOOOOOOOOO--------                                        53M
May 10      13k --------------------                                        53M
May 11      44M OOOOOOOOOOOOOOOOO--------------------                       98M
May 12      12M OOOO--------------------------------------                 111M
May 13    7120k OO-------------------------------------------              118M
May 13      20M OOOOOOO----------------------------------------------      139M
Much more interesting! A long time ago, I had a bunch of quick one-lines which used sum.pl to produce output from various other system counters, but somehow it got lost.

As I get only few comments on my blog, if you find this useful, leave one. I have few other examples, like the one which shows top 5 memory eaters on my system:

dpavlin@llin:~$ ps v | awk '{ print $8 " " $9 " " $10 }' | sort -rn | ~/private/perl/sum.pl | head -5
# RSS %MEM COMMAND
10.6 /usr/lib/iceweasel/firefox-bin 165092 OOOOOOOOOOOOOO                165092
4.3 perl                             67240 OOOOOO---------------         232332
0.5 awesome                           8504 ---------------------         240836
0.4 irssi                             6632 ----------------------        247468
0.3 vi                                5888 ----------------------        253356
but, if this is not interested to my readership, tell me so, and I will stop spamming your already full RSS reader with console output! :-)

April 30, 2008

State of linux wifi (first week with OLPC)

So It has been a week from time when borrowed OLPC entered my family of computers. I have Thinkpad T60 with Atheros AR5212 (which works with atk5k driver from 2.6.25, nice work!) and Eee PC with Atheros (which works with special madwifi patch).

Since 802.11s just landed into upstream kernel git, I was eager to take a look at this mash network thing. Oh, how ignorant I was. OLPC uses 802.11s protocol which is different from official implementation of 802.11s and with good reason: they are using embedded processor in wifi card do to mash protocol for them (saving power and enabling mash to work when laptop is suspended). I could have installed olsr on OLPC, but I'm really trying to have bigger mash which is compatible with unmodified OLPCs.

Because my time is limited, I would like to work in user-land if at all possible, and since wpa_supplicant can work on unmodified kernels, it would be nice to have that level of support for OLPC mash also. After a lot of browsing (and reading few really great wifi hacking sites), I concluded that only hope is radiotap which is more-or-less supported on every pcmcia wifi card that I have (prism based 802.11b card and rt2500). I had also found simpliest possible code which uses radiotap to start with.

Now, I would just need another OLPC to save some network traces and start experimenting :-)

Aside from that, I switched totally to OLPC for this week, and amazingly enough, I didn't miss my Eee PC one tiny bit. Although a bit slower than Eee, OLPC screen is bigger (and better in black and write mode on sunlight) which helps a lot with web pages. Browser performance is amazing, so I have little doubt that we will be able to support most of web sites on OLPC without much problem. OOH, I did notice a couple of excessive round-tips on one of my web sites, while surfing on it, but that's for best anyway :-)

Update: According to message on libertas-dev mail list there is effort to use kernel's 802.11s implementation which makes my effort in supporting OLPC variant obsolete.

April 23, 2008

First day with OLPC

Part of my day job (and reason why I love it) is working with guys from OLPC Croatia in spreading a word and intention of this project. Yesterday we had great presentation in Varaždin and they are interested in pilot project.

With news like this it seems that we are fighting windmills. I hope that we are not wrong to invest time and effort into this since my first day of experience with little green thing is very positive.

For a start let me clear some doubts: it is slow machine. Keyboard is really different (much worse than Eee PC for example), but if you don't press it harder than needed (which was my problem) it's actually O.K. It's also heavier than Eee PC (which somehow I noticed carrying it around) but other than that it great machine.

Having said that, I had problems with associating to my AP (so I used scripts for that), but as a real Linux user, I decided to take a look at it. From that point on, things improved: first i noticed very nice upgrade mechanism with allowed me to upgrade to build 703 which works great. This build is not final, so it comes without any Actions (think of them as applications installed on OLPC), but installation of Action pack from USB stick is really easy (it's also possible to upgrade system using USB stick if you don't have network).

The only show-stopper problem that I now have is how to buy OLPC? I don't even know if we can order just 2000 units for pilot program, but I would really love to have ability to buy device for me and other interested developers in CARNet. Having said that, if you do have OLPC which you are not using, it can find new home with interested developer.

Aside from that, hi-res black and white screen is just beautiful, machine performs extremely well if you have in mind that it's basically 430Mhz pentium machine with 256Mb or RAM and 1Gb of flash. It seems like too little, but if you use OLPC as network connected digital notebook (with sharing!) it works beautifully. It best mobile device I had since palm back in last century.

Every little thing of this machine is designed great: kudos do development team. One small thing is that you can rotate screen in both directions (Toshiba, take note of this!). Hardware keys on both sides of display actually make usage a joy (and decrease a need to touch keyboard).

If we from moment move on from hardware to Sugar, it really great user interface which I would love to see on other laptops (before you ask, I did try it on Debian and it does work, I will take another look after I will have to return laptop).

I really hope that this project will live on and enable us to deploy units to elementary schools and hopefully achieve some benefit for students (which I'm somehow sure we'll have :-)

April 15, 2008

First Croatian Perl Workshop

First of all, we had first ever Croatian perl workshop. Thanks to all the people who showed up, we had attendance of about ten.

Organizing a workshop event turned out to be much more work then I anticipated, and various other tasks stopped me from preparing for it as good as I should. Also, small number of people force me to re-consider my lectures about perl. On one hand, I really, really, tried to spread perl (and had good fortune of being at right place at right time to get Zagreb.pm off the ground), but with such low attendance, I must conclude that perl is used only by about 20 people in Zagreb. This seems somehow disturbing. Comparing size of Zagreb with Moscow turned out to show about same proportion, so I was just overly optimistic.

I also gave half an hour presentation about Jifty, based on Building a Jifty app in a jiffy by Kevin Falcone and showed some examples of my jifty apps (I actually didn't talk about last one, just mentioned it as integration of external javascript -- CodePress in this example).

I also have to thank to Andrew Shitov from Moscow.pm who have managed to prepare several very interesting topics which, in my opinion, made this event worthwhile. If it wasn't free I would ask my money back :-\

March 23, 2008

VRač - virtualno računalo

Really funny name of post, isn't it? It should mean something to people who understand Croatian: it's virtual computer. This fun toy begin it's existence as Orao emulator. In the process, I wrote 6502 and Z80 emulator (actually only perl xs bindings for existing cpu emulators) and implemented Orao, Galaksija and Galeb using it.

Various machines are in different stage of usability. Orao emulation is working (without tape support), Galaksija is too slow to be useful (also doesn't have tape) and Galeb doesn't have keyboard or tape support.

I would be very grateful for hardware information about those machines if you can spare some time writing them. I'm somewhat hope that source code of this emulators might serve as historic reference how this machines where constructed.

Lately, I have also found mess project, but soon after that I also noticed that I need about six times as much to implement changes in it. C is just not my language, so actually only hope for preservation of those computers is good documentation (which is something mess community is striving also).

On the other hand, currently CPU emulation library is not license compatible with GPL so I won't probably push this to CPAN any time soon but I would love to find another implementation (PowerPC anyone?) and basically create thin layer of perl code which can express different architectures.

Having written all that now it's time to show you how to compile dam thing!

svn co svn://svn.rot13.org/VRac/
cd VRac
perl Makefile.PL
make
make orao

And you will have working Orao emulator in a second. You can also try make galaksija or make galeb.


This post was originally written on 2007-09-05 when I wanted to announce this project after playing with it over the summer. Since I never finished emulation of Galaksija or Galeb, I left it in draft mode for too long. Now that first Croatian Perl Workshop is coming along I will talk about this project, so it's just fair to announce it, so you can have preview about what my lecture will be.

March 15, 2008

Predavanje: SQL od početnika do relacijskog maga

Danas sam na Razmjeni vještina održao maratonsko četverosatno predavanje koje je nadam se bilo donekle korisno. Nažalost, nismo stigli ući u detalje onoliko koliko bih želio, ali ako ništa drugo ponovo sam koristio pgrestraier (koji se nekako indeksira previše sporo, morati ću pogledati zašto) i još jedan zgodan projektić koji sam napisao prošle godine za studente u Zadru pg-getfeed koji je zapravo mala perl stored procedura kojim možete raditi SQL upite na RSS feedovima.

March 7, 2008

irc-logger - memory augmentation for #irc

Initially created in 2006 this handy tool is best described with original commit message:

IRC bot which replace human memory

Here is a quick run-down through available features:

  • web archive with search
  • irc commands: last, grep/search, stat, poll/count
  • tags// in normal irc messages (tagcloud, filter by tag, export as RSS feed)
  • announce /me messages to Twitter (yes, lame, but that was a year ago)
  • tags are available as html links for embedding (in wikis)
  • RSS feed from messages with tags (also nice for embedding)
  • irssi log import (useful for recovery in case of failure of machine or service :-)
  • announce new messages from RSS feeds (nice for wiki changes, blog entries or commits)

It has grown quite a bit from initial vision to recall last messages on the web (and it does go through some hoops to produce nice web archive). Adding of tags allowed easy recall of interesting topics but in a way now it provides an central hub for different content connected to irc.

It's written in perl using POE and it's probably not best example of POE usage. It is also somewhat PostgreSQL specific but works well for our small comunity at #razmjenavjestina irc channel. Since I have seen some interest in it this blog post might serve as announce of it's existence.

I will probably add some documentation to it's wiki page and add real muti-channel support (most of code is in there, but web archive needs filtering by channel). If you are interested to /invite it to your channel, drop me a note.

March 4, 2008

Sync part of subversion repository

I had a particular problem at work: we have upstream subversion repository which we access over ssh tunnel (using svn protocol) which contains two branches in which we are interested and various other stuff we don't care about (and don't want to mirror).

On other hand, we also wanted to have local copy of all changes (preserving history) and local commit messages and SVN::Web interface.

In original idea, I also wanted to keep revision numbers as-is (so I can just checkout our local version and be done), but this wasn't possible. One solution that we examined is to use Pushmi and make local copy, but we didn't want all the other changes.

Other idea was to use svndumpfilter to sync only two branches we are interested in (it will create dummy commits for revision which are outside our branches), but since branches are result of copy from parts of the tree we don't want to sync, it didn't work either.

Did I mentioned that our svn repository can access upstream only through carefully crafted ssh tunnels? Mess, right?

So, in the end, solution was hybrid:

  • make local copy of two upstream branches using svk (loosing original order of commits, even if we are commiting into same svk mirror copy at our side)
  • install post-commit hook in upstream repository which will call (over https) svk sync at our side (I would probably use SMTP to trigger that, but our machine with svn repository doesn't accept outside e-mail)
  • install local post-commit hook to send e-mail notifications

Rest of this post are instructions on how to do this. Since I learned a thing of two doing this, I hope it might be also useful for others.

First create svn-pull.sh shell script which will run under user which has ssh keys to login to upstream firewall (1.2.3.4 in this example) and setup tunnels to upstream svn server (10.1.1.1):

#!/bin/sh
ssh -L 13690:10.1.1.1:3690 1.2.3.4 sleep 2 &
pid=$!
SVKROOT=/home/user/.svk svk sync -a
kill $pid

Now setup mirrors of branches we care about:

svk mirror svn://127.0.0.1:13690/project/carnet-foo /project/foo
svk mirror svn://127.0.0.1:13690/project/carnet-bar /project/bar

This is all nice, but we need to trigger it from www-user which is done with following in /etc/sudoers:

www-data ALL=(user) NOPASSWD:/home/user/svn-pull.sh

and add simple cgi script which will trigger sync operation:

#!/bin/sh
echo -e "Content-type: text/plain\n\r\n\r"
sudo -u user /home/user/svn-pull.sh

I used ScriptAlias in apache to make it visible at https://svn-ours.example.com/upstream-svn-update. No need on obsucate URL, since it's behind SSL for added points. IP address limit might also be a good idea:

  <Location /upstream-svn-update>                   
        Order allow,deny
        Allow from 1.2.3.4
  </Location>

Now install post-commit hook in upstream repository. We care only for files which have /carnet in path since branches which we are interested have that prefix:

svn log -v -r $REV file://$REPOS | grep ' /carnet' 2>/dev/null \
    && wget -q -O /dev/null https://svn-ours.example.com/upstream-svn-update

You will notice that there are no locking or any other tweaks, since all tools have those capabilities anyway, so we are really just using RPC via cgi over https in fact.

Nice and easy, once you know how to do it! It seems like a few bits of configuration all over the place, but I hope that it employs KISS - keep it simple and stupid at it's best.

Update: OK, now we have local repository (with different revisions), but svn switch --relocate doesn't work because those repositories are not same (makes sense, eh?)

Following steps are quick explanation now to copy .svn directories from new repository:

cd /srv/carnet-foo
# update repository to last upstream version
svn update
# delete old .svn directories
find . -name ".svn" -exec rm -Rf {} \;
# checkout new repository
cd /srv
svn co svn://svn-ours.example.com/carnet-foo carnet-foo.new
# copy new .svn files to old repository
cd carnet-foo.new
find . -wholename "*/.svn/*" | cpio -pvd ../carnet-foo/
# cleanup
cd /srv
rm -Rf carnet-foo.new
# following shouldn't return any differences
cd carnet-foo
svn diff

March 3, 2008

Negroponte u Zagrebu

Nekako pomalo nezapaženo je prošlo vrlo zanimljivo predavanje Nicholasa Negropontea u Zagrebu. O organizaciji konferencije ne treba trošiti riječi, ali želio bih djelomično prokomentirati neke nedoumice o OLPC ideji.

Za početak zaboravite na laptop. Da, znam teško je. Ili je lijep ili ružan. Ili spor. Zaboravite na laptop. Dobro?

Ideja je promijeniti način školovanja. Negroponte očigledno to želi naročito napraviti u zemljama "trećeg svijeta" (što je, prisjetimo se malo bila i Hrvatska nekada u doba Jugoslavije) zbog toga što tamo većinom vlada još Viktorijansko poučavanje 19. stoljeća koje šibom utjeruje znanje.


Svi citati su moji sjećanja na sadržaj predavanja, a ne nikakav prijevod materijala (koji su bili snimljeni, pa se možemo nadati da ćemo ih moći nekada i pogledati).

...učenicima se do četvrtog razreda izgubi ta žar u očima koju imaju kada prvi pita dođu u školu...

To je kreativistički način podučavanja (ili poticanje istraživanja) koji se onda može vidjeti i u drugim aktivnostima koje nemaju veze sa računalima. Čuli smo primjer da djeca koja nauče programirati (program nikada ne proradi od prve!) dobiju novu vještinu debuggiranja koju onda koriste i u testovima ispravnog pisanja riječi. Njih zanima zašto su pogriješili te dvije riječi i nije im dovoljno što su sve ostale točne.

Slijedeća priča pokušava opisati situaciju na drugi način:

Postojao je narod koji nije imao pismo. U jednom trenutku vođe naroda odlučile su da treba uvesti pismo, pa su uvele po jednu olovku i svaku školu.

Kako to nije urodilo plodom, odlučili su u svaku školu staviti po jedan razred koji je imao po jednu olovku na svakom stolu.

Naravno, svi znamo da je bolji način dati svakom učeniku olovku.

Dakle, sve rasprave da li je OLPC ili Asus EEE PC bolji, jednostavno promašuju cijeli cilj. Nije čak ni stvar u tome da je OLPC dva puta jeftiniji ($200 * 5 = 1000 kn) od Asusa nego jednostavno u tome da je jeftin laptop jedini način da svaki učenik dobije svoju olovku.

OLPC laptop je zapravo samo jako zanimljiva implementacija te ideje.

Zamislite sat u školi upoznavanja sa znakovima, ulicama i prostorom oko sebe:

Djeco, nacrtajte ulice oko škole i znakove koje ste vidjeli na njima (prisjetimo se, OLPC ima podršku za grupno crtanje). Nakon što su nacrtali ulice i napisali imena, može se spomenuti da je standardna orijentacija karata prema sjeveru i zašto crtamo baš tako, a onda, s obzirom da je lijep i sunčan dan, djeca mogu krenuti u obilazak škole i koristeći više OLPC-a istovremeno (zgodnim programom koji smo vidjeli koji može mjeriti udaljenosti između dva uređaja) upisati na svoje mape točne udaljenosti.

Mapa onda sama može korigirati odnose (prisjetimo se, svi OLPC-ovi su povezani tako da se mapa sama obnavlja svima) i na kraju imaju neku ideju o tome kako prostor možemo zapravo prikazati kao skicu (npr. tramvajske pruge) a različito kao mapu sa stvarnim omjerima.

Usput, naravno možemo pogledati i horizontalnu i vertikalnu signalizaciju (zebre, semafori), i npr. snimiti filmić.

Možda nisam siguran u program kojeg predmeta bi ovakav zanimljiv zadatak spadao, ali mislim da mu nedostaje samo "pametna mapa" koja bi mogla biti jednostavna aplikacija na OLPC-u. Da li čujem glasove zainteresiranih pythonaša? :-)