Group by data in shell pipes

My mind is just too accustomed to RDBMS engines to accept that I can't have GROUP BY in my shell pipes. So I wrote one groupby.pl.


Aside from fact that it somewhat looks like perl golfing (which I'm somewhat proud of), let's see how does it look:


dpavlin@llin:~/private/perl$ ps axv | ./groupby.pl 'sum:($6+$7+$8),10,count:10,min:($6+$7+$8),max:($6+$7+$8)' | sort -k1 -nr | head -10 | align
440947 /usr/lib/iceweasel/firefox-bin 1 440947 440947
390913 /usr/sbin/apache2 11 22207 39875
180943 /usr/bin/X 1 180943 180943
135279 /usr/bin/pinot-dbus-daemon 1 135279 135279
122254 mocp 2 25131 97123
84887 pinot 1 84887 84887
78279 postgres: 5 10723 21971
70030 /usr/bin/perl 6 6959 15615
50213 /bin/bash 7 6351 7343
49266 /usr/lib/postgresql/8.2/bin/postgres 2 24631 24635

This will display total usage for process, it's name, number of such processes and range of memory usage. We can then use old friend sum.pl to produce console graph, but I already wrote about it.


So, let's move to another example, this time for OpenVZ. Let's see how much memory is each virtual machine using (and get number of processes for free):



$ vzps -E axv --no-headers | ./groupby.pl 'sum:($7+$8+$9*1024),1,count:1'
2209504127 0 265
611768242 212024 38
162484775 212037 19
170797534 212052 38
104853258 212226 26
712007227 212253 21

But wouldn't it be nice to display hostnames instead of VEID numbers? We can, using --join and --on options (which are really backticks on steroids):

$ vzps -E axv --no-headers | ./groupby.pl 'sum:($7+$8+$9*1024),1,count:1' --join 'sudo vzlist -H -o veid,hostname' --on 2
2146263206 0 259
675835528 saturn.ffzg.hr 40
162484775 arh.rot13.org 19
170797534 koha-dev.rot13.org 38
104853258 koha.ffzg.hr 26
712011323 zemlja.ffzg.hr 21

Which brings us to final result:

$ vzps -E axv --no-headers | ./groupby.pl 'sum:($7+$8+$9*1024),1,count:1' --join 'sudo vzlist -H -o veid,hostname' --on 2 | sort -rn | align | ./sum.pl -h
0 260 2105M OOOOOOOOOOOOOOOOOOO 2105M
zemlja.ffzg.hr 21 679M OOOOOO------------------- 2784M
saturn.ffzg.hr 35 512M OOOO-------------------------- 3296M
koha-dev.rot13.org 38 162M O------------------------------ 3459M
arh.rot13.org 19 154M O-------------------------------- 3614M
koha.ffzg.hr 26 99M ---------------------------------- 3714M

So, here you have it: SQL like query language for your shell pipes.