Results matching “PostgreSQL”

My mind is just too accustomed to RDBMS engines to accept that I can't have GROUP BY in my shell pipes. So I wrote one groupby.pl.


Aside from fact that it somewhat looks like perl golfing (which I'm somewhat proud of), let's see how does it look:


dpavlin@llin:~/private/perl$ ps axv | ./groupby.pl 'sum:($6+$7+$8),10,count:10,min:($6+$7+$8),max:($6+$7+$8)' | sort -k1 -nr | head -10 | align
440947 /usr/lib/iceweasel/firefox-bin 1 440947 440947
390913 /usr/sbin/apache2 11 22207 39875
180943 /usr/bin/X 1 180943 180943
135279 /usr/bin/pinot-dbus-daemon 1 135279 135279
122254 mocp 2 25131 97123
84887 pinot 1 84887 84887
78279 postgres: 5 10723 21971
70030 /usr/bin/perl 6 6959 15615
50213 /bin/bash 7 6351 7343
49266 /usr/lib/postgresql/8.2/bin/postgres 2 24631 24635

This will display total usage for process, it's name, number of such processes and range of memory usage. We can then use old friend sum.pl to produce console graph, but I already wrote about it.


So, let's move to another example, this time for OpenVZ. Let's see how much memory is each virtual machine using (and get number of processes for free):



$ vzps -E axv --no-headers | ./groupby.pl 'sum:($7+$8+$9*1024),1,count:1'
2209504127 0 265
611768242 212024 38
162484775 212037 19
170797534 212052 38
104853258 212226 26
712007227 212253 21

But wouldn't it be nice to display hostnames instead of VEID numbers? We can, using --join and --on options (which are really backticks on steroids):

$ vzps -E axv --no-headers | ./groupby.pl 'sum:($7+$8+$9*1024),1,count:1' --join 'sudo vzlist -H -o veid,hostname' --on 2
2146263206 0 259
675835528 saturn.ffzg.hr 40
162484775 arh.rot13.org 19
170797534 koha-dev.rot13.org 38
104853258 koha.ffzg.hr 26
712011323 zemlja.ffzg.hr 21

Which brings us to final result:

$ vzps -E axv --no-headers | ./groupby.pl 'sum:($7+$8+$9*1024),1,count:1' --join 'sudo vzlist -H -o veid,hostname' --on 2 | sort -rn | align | ./sum.pl -h
0 260 2105M OOOOOOOOOOOOOOOOOOO 2105M
zemlja.ffzg.hr 21 679M OOOOOO------------------- 2784M
saturn.ffzg.hr 35 512M OOOO-------------------------- 3296M
koha-dev.rot13.org 38 162M O------------------------------ 3459M
arh.rot13.org 19 154M O-------------------------------- 3614M
koha.ffzg.hr 26 99M ---------------------------------- 3714M

So, here you have it: SQL like query language for your shell pipes.

Why does console refuse to die?
- It's because of pipes!
This post is result of my long addiction to console applications. Somehow, when I want to get a quick view of things on my system, I always turn to pipes and do something with them. In that process, I developed few of useful scripts for use within shell pipes, and I would like to introduce my readership to them.

PostgreSQL database size

When I want to see size of all databases on my system or size of tables in one database I turn to pg_size. It's a short and sweet script which will do a little shell magic (take a look in it) and display size of all databases on system (without any options) or size of all tables in database and number of rows (when used like pg_size database_name) like this:
dpavlin@llin:~$ pg_size dellstore2 | grep -v sql_
4890624 customers 20000
3153920 orderlines 60350
2678784 cust_hist 60350
991232 products 10000
966656 orders 12000
450560 inventory 10000
8192 categories 16
0 reorder 0
This is all nice and well, but doesn't really gives us the right overview, so move along for...

Nice console graphs

First, a caveat: this tools assumes that it will get number, space, and optional description. Output above seems to fit into this description, so let's try it:
dpavlin@llin:~$ COLUMNS=80 pg_size dellstore2 | grep -v sql_ | sum.pl -h
customers 20000    4776k OOOOOOOOOOOOOOOO                                 4776k
orderlines 60350   3080k OOOOOOOOOO-----------------                      7856k
cust_hist 60350    2616k OOOOOOOOO---------------------------               10M
products 10000      968k OOO-------------------------------------           11M
orders 12000        944k OOO----------------------------------------        12M
inventory 10000     440k O-------------------------------------------       12M
categories 16      8192b ---------------------------------------------      12M
reorder 0              0 ---------------------------------------------      12M
This gives us nice output: description (followed by number of rows from above output), and running total of size in human readable form (if you don't like it, remove -h flag and you will get raw numbers).

Let's take another example (if you are still reading this and not interested in PostgreSQL database size). Let's see how much traffic did pppd transfer over very slow GPRS link on 8-day vacation:

dpavlin@llin:~$ grep 'pppd.*: Sent' /var/log/messages | awk '{ print $7 + $10 " " $1 " " $2 }' | sum.pl -h
May 5         0                                                               0
May 5       39k                                                             39k
May 5     7512k OO                                                        7551k
May 6     6352b --                                                        7558k
May 6       20k --                                                        7579k
May 6     1183k ---                                                       8762k
May 8     6869k OO---                                                       15M
May 8       70k -----                                                       15M
May 9     3596k O------                                                     18M
May 9     1998k -------                                                     20M
May 10      32M OOOOOOOOOOOO--------                                        53M
May 10      13k --------------------                                        53M
May 11      44M OOOOOOOOOOOOOOOOO--------------------                       98M
May 12      12M OOOO--------------------------------------                 111M
May 13    7120k OO-------------------------------------------              118M
May 13      20M OOOOOOO----------------------------------------------      139M
Much more interesting! A long time ago, I had a bunch of quick one-lines which used sum.pl to produce output from various other system counters, but somehow it got lost.

As I get only few comments on my blog, if you find this useful, leave one. I have few other examples, like the one which shows top 5 memory eaters on my system:

dpavlin@llin:~$ ps v | awk '{ print $8 " " $9 " " $10 }' | sort -rn | ~/private/perl/sum.pl | head -5
# RSS %MEM COMMAND
10.6 /usr/lib/iceweasel/firefox-bin 165092 OOOOOOOOOOOOOO                165092
4.3 perl                             67240 OOOOOO---------------         232332
0.5 awesome                           8504 ---------------------         240836
0.4 irssi                             6632 ----------------------        247468
0.3 vi                                5888 ----------------------        253356
but, if this is not interested to my readership, tell me so, and I will stop spamming your already full RSS reader with console output! :-)

Danas sam na Razmjeni vještina održao maratonsko četverosatno predavanje koje je nadam se bilo donekle korisno. Nažalost, nismo stigli ući u detalje onoliko koliko bih želio, ali ako ništa drugo ponovo sam koristio pgrestraier (koji se nekako indeksira previše sporo, morati ću pogledati zašto) i još jedan zgodan projektić koji sam napisao prošle godine za studente u Zadru pg-getfeed koji je zapravo mala perl stored procedura kojim možete raditi SQL upite na RSS feedovima.

Initially created in 2006 this handy tool is best described with original commit message:

IRC bot which replace human memory

Here is a quick run-down through available features:

  • web archive with search
  • irc commands: last, grep/search, stat, poll/count
  • tags// in normal irc messages (tagcloud, filter by tag, export as RSS feed)
  • announce /me messages to Twitter (yes, lame, but that was a year ago)
  • tags are available as html links for embedding (in wikis)
  • RSS feed from messages with tags (also nice for embedding)
  • irssi log import (useful for recovery in case of failure of machine or service :-)
  • announce new messages from RSS feeds (nice for wiki changes, blog entries or commits)

It has grown quite a bit from initial vision to recall last messages on the web (and it does go through some hoops to produce nice web archive). Adding of tags allowed easy recall of interesting topics but in a way now it provides an central hub for different content connected to irc.

It's written in perl using POE and it's probably not best example of POE usage. It is also somewhat PostgreSQL specific but works well for our small comunity at #razmjenavjestina irc channel. Since I have seen some interest in it this blog post might serve as announce of it's existence.

I will probably add some documentation to it's wiki page and add real muti-channel support (most of code is in there, but web archive needs filtering by channel). If you are interested to /invite it to your channel, drop me a note.

status line for dwm

I will be working on battery for most of next week, so I spent some time tweaking my setup. I have been running 2.6.21 because of my efforts to make gflrx play nice with CONFIG_PARAVIRT, so I had tickless kernel needed for powertop. To my horror, most of interrupts on my laptop was created by PostgreSQL (which I will stop when using battery) and ACPI! And that's because I'm using dwm with a primitive shell grep/sed pipe to produce status.

So, polling ACPI every 3 seconds (which is reasonable refresh time for me, even for clock) is too much. And than I started dreaming: network traffic would be nice. And a disk! Battery status when (dis)charging goes without saying. So, in the end, I also added a temperature, and got something like this (when on power):

2007-05-27 22:15:40 | 0.30 0.13 0.06 |   5M D 1k   |  32b > 54b  | 59 C

perl code it's nice and short but completely broken about estimates for charging time (how does acpi command-line tool calculate that?)

When started as stand-alone utility it can be poor man's replacement for dstat :-)

I just finished set of shell scripts which test failover and replication using Slony-1 and PostgreSQL.

It's somewhat Debian specific (using pg_ctlcluster to control PostgreSQL instances), but other than that it should work on any platform. This is essentially codification of requirements for HA testing I had to do, and I hope that this automatic demonstration would be valuable to others to test their specific requirements. Without this throwaway model, I wouldn't be able to test production database failover, because there is just too much steps which had to be done in specific order.

So, my fascination with testing expanded from development to deployment.


I'm somewhat proud to announce that current version of pgestraier now includes consistency triggers which will keep Hyper Estraier index up-to-date with data in your database.

That, coupled with ability to create full-text indexes easily, just by running helper script on database, makes pgestraier powerful solution if you need fast full-text indexing with ability to off-load search to another machine (thanks to Hyper Estraier P2P architecture) or need perfect N-gram search results.

This project might also help people who are porting applications which use MySQL full-text search to PostgreSQL (actually, it's it going to be used just for that).

Real trunk of development is in Subversion, and CVS repository at pgFoundry is just a mirror copy. Enjoy it, while I prepare to leave for seaside.

Search::Estraier will be perl API to Hyper Estraier. It's written using excellent Inline::C

Grand plan is to use pgestraier from PostgreSQL to query Estraier index and Search::Estraier to create index. This would allow to combine structured data in RDBMS with semi-structured data in full text index. Additional normalized tables can be created using materialized views in PostgreSQL, and if all goes well, it will be part of WebPAC version 2 which will be universal hybrid (structured/full text) storage.

Update: I had good luck to find already working perl bindings for Hyper Estraier at MATSUNO Tokuhiro blog. Thanks a lot! So, work on Search::Estraier is suspended.

My pet project is on pgFoundry. pgswish allows integration of search results from swish-e index and structured data from PostgreSQL. It's a function which returns multi-row data in PostgreSQL which is swish-e search result.

However, I also posted e-mail message to swish-e mail list and Bill Moseley had some great comments. If you know how to write PostgreSQL C-functions I would be great-full for another pair of eyes on this project. My C is rusty.