Results tagged “OpenVZ”

I have been playing with Linux containers for a while, and finally I decided to take a plunge and migrate one of my servers from OpenVZ to lxc. It worked quite well for testing until I noticed lack of support for automatic startup or shutdown. lxc-users mailing list was helpful in providing useful hints and I found 5 OpenSUSE scripts but decided it's too complicated for task at hand.

I wanted single script, for easy deployment on any Debian box, with following features:

  • reboot and halt from inside container should work as expected
  • cleanly shutdown or reboot container from host (as opposed to lxc-stop which is equivalent to turning power off)
  • init script to start and stop containers on host boot and shutdown

Result is lxc-watchdog.sh. It can start containers (and remember to start it on next host reboot), reboot or halt containers from host (using signals to trigger container's init) and automatically configure your container when you start it for the first time. Here is quick overview:

root@prod:/srv# ./lxc-watchdog.sh status
koha-240 RUNNING boot /virtual/koha-240
koha-241 STOPPED boot /virtual/koha-241
koha-242 STOPPED      /virtual/koha-242

root@prod:/srv# ./lxc-watchdog.sh start
# start koha-240
'koha-240' is RUNNING
# start koha-241
2010-03-16T23:44:16 koha-241 start
'koha-241' is RUNNING
# skip start koha-242

root@prod:/srv# ./lxc-watchdog.sh status
koha-240 RUNNING boot /virtual/koha-240
koha-241 RUNNING boot /virtual/koha-241
koha-242 STOPPED      /virtual/koha-242

root@prod:/srv# ls -al /var/lib/lxc/*/on_boot
-rw-r--r-- 1 root root 9 2010-03-16 21:40 /var/lib/lxc/koha-240/on_boot
-rw-r--r-- 1 root root 9 2010-03-16 21:40 /var/lib/lxc/koha-241/on_boot
-rw-r--r-- 1 root root 0 2010-03-16 22:58 /var/lib/lxc/koha-242/on_boot
As you can see, I used file /var/lib/lxc/name/on_boot to record which machines to bring up. When container is started for the first time, it will have boot enabled (just in case this is production application which you will reboot in 6 months and then wonder why it doesn't work). You can change boot status using:
root@prod:/srv# ./lxc-watchdog.sh boot koha-242
# boot koha-242

root@prod:/srv# ./lxc-watchdog.sh status
koha-240 RUNNING boot /virtual/koha-240
koha-241 RUNNING boot /virtual/koha-241
koha-242 STOPPED boot /virtual/koha-242

root@prod:/srv# ./lxc-watchdog.sh disable koha-242
# disable koha-242
Installation as init script /etc/init.d/lxc-watchdog is easy:
root@prod:/srv# ln -s /srv/lxc-watchdog.sh /etc/init.d/lxc-watchdog

root@prod:/srv# update-rc.d lxc-watchdog defaults
update-rc.d: using dependency based boot sequencing
And finally, it can also be used to manually start, halt or reboot containers:
root@prod:/srv# /etc/init.d/lxc-watchdog start koha-242
# start koha-242
2010-03-16T23:47:46 koha-242 start
'koha-242' is RUNNING

root@prod:/srv# /etc/init.d/lxc-watchdog status
koha-240 RUNNING boot /virtual/koha-240
koha-241 RUNNING boot /virtual/koha-241
koha-242 RUNNING      /virtual/koha-242

root@prod:/srv# /etc/init.d/lxc-watchdog restart koha-242
# restart koha-242
2010-03-16T23:48:46 koha-242 kill -SIGINT 24838

root@prod:/srv# /etc/init.d/lxc-watchdog status
koha-240 RUNNING boot /virtual/koha-240
koha-241 RUNNING boot /virtual/koha-241
koha-242 RUNNING      /virtual/koha-242

root@prod:/srv# /etc/init.d/lxc-watchdog stop koha-242
# stop koha-242
2010-03-16T23:49:55 koha-242 stop
2010-03-16T23:49:55 koha-242 kill -SIGPWR 26086
2010-03-16T23:50:11 koha-242 stoped
In fact, you can use halt or reboot if you don't like stop and restart, just to keep one mapping less in your brain when working with it.

Log files are created for each container in /tmp/name.log. They include lxc-start output with boot messages and any output that started scripts might create which is useful for debugging container installations. Output from watchdog monitoring /var/run/utmp in container is also included, and it reports number of tasks (processes) in container, and here is example of stopping container:

root@prod:/srv# tail -5 /tmp/koha-242.log
2010-03-16T23:49:56 koha-242 66 tasks
2010-03-16T23:50:04 koha-242 22 tasks
2010-03-16T23:50:11 koha-242 runlevel 2 0
2010-03-16T23:50:11 koha-242 halt
2010-03-16T23:50:12 koha-242 watchdog exited
Hopefully this will make your switch to Linux Containers and recent kernels easier...

I know that title is mouthful. But, I occasionally use this blog as place to dump interesting configuration settings and it helps me remember configuration which helps me to remember it and might be useful to lone surfers who stumbles upon this page.

graph.png

This weekend our nice network admin upgraded VLAN setup to enables 1Gb/s network across whole private segment. That is clearly visible in green arrows on picture, especially if you compare it with last one. We still have a few slower links (100 Mb/s) which pass through NAT to Internet (red arrows), but we now have private segment on every server in library and system room in main building and each server has IP address in new 1 Gb/s segment (you can notice that by blue arrows which are loopback interface on same machine).

All is not well, however. I had to reconfigure my OpenVZ containers from point-to-point ip configuration using venet devices to ethernet bridge setup because multiple hops on OpenVZ machine produced havoc with connections from our private network to containers. It was very wired, I saw TCP retransmission requests, but first packet somehow managed to pass through. ICMP traffic worked (thanks to small packet sizes), but I can't honestly say why our new VLAN setup and/or OpenVZ had problem with it. Up to that point, I just had private IP addresses assigned to OpenVZ container using vzctl set --ipadd 10.60.0.12 and ping -R did show multiple public and private addresses, but it all somehow worked.

First, I had to make network bridge which will be new VLAN 60

root@koha-hw:~# cat /etc/network/interfaces

auto br60
iface br60 inet static
        bridge_ports eth1 veth212226
        bridge_fd 0
        address 10.60.0.10
        netmask 255.255.254.0
        gateway 10.60.0.1
Then, I removed old private IP addresses:
root@koha-hw:~# vzctl set 212226 --ipdel 10.60.0.12 --save
and add new veth device (my version of vzctl requires MAC addresses, so beware!):
root@koha-hw:~# vzctl set 212226 --netif_add veth1,00:12:34:56:78:9A,veth212226,00:12:34:56:78:9B,br60 --save
This create veth1 device inside container to which I can assign IP address:
koha:~# ifconfig veth1 10.60.0.12 netmask 255.255.254.0 up
I decided to put that in /etc/rc.local because /etc/network/interfaces is managed by OpenVZ.

Next, I needed to reconfigure my ssh VPN which was using point-to-point tunneling with tap into real ethernet tunnel:

root@brr:~# tail -10 /etc/network/interfaces 

# vpn lib -> home
auto tap0
iface tap0 inet static
        pre-up ssh -M -S /var/run/tap0 -v -f -i /root/.ssh/koha-hw-tun0 -w 0:0 -o Tunnel=ethernet -o ServerAliveInterval=5 -o ServerAliveCountMax=2 koha-hw.ffzg.hr true
        address 10.60.0.81
        netmask 255.255.0.0
        up iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -o tap0 -j MASQUERADE
        post-down ssh -S /var/run/tap0 -O exit koha-hw.ffzg.hr
On remote side, force tunnel number and add new device to bridge:
root@koha-hw:~# head -1 /root/.ssh/authorized_keys 
tunnel="0",command="ifconfig tap0 up ; brctl addif br60 tap0" ssh-rsa AAAAB3Nza...

Finally, little cron job to start (and restart) VPN:

root@brr:~# cat /etc/cron.d/tap0 
SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
*/1     * * * * root    fping -q 10.60.0.10 || ( ifdown tap0 ; ifup tap0 )

It seems that I wasn't the first one to have idea of sharing MySQL installation between OpenVZ containers. However, simple hardlink didn't work for me:

root@koha-hw:~# ln /vz/root/212052/var/run/mysqld/mysqld.sock \
     /vz/root/212056/var/run/mysqld/
ln: creating hard link `/vz/root/212056/var/run/mysqld/mysqld.sock' to
    `/vz/root/212052/var/run/mysqld/mysqld.sock': Invalid cross-device link
So I tried something a little bit different:
root@koha-hw:~# mount --bind /vz/root/212052/var/run/mysqld/ \
     /vz/root/212056/var/run/mysqld/
Which worked for me on old etch Debian kernel:
root@koha-hw:~# uname -a
Linux koha-hw 2.6.18-openvz-18-53.5d1-amd64 #1 SMP Sat Mar 22 23:58:13 MSK 2008 x86_64 GNU/Linux
I have wrong permissions, but socket is world-writable anyway... There is also related bug for kernels 2.6.24 and 2.6.26.

I'm working on Linux version of Sun storage machines, using commodity hardware, OpenVZ and Fuse-ZFS. I'm do have working system in my Sysadmin Cookbook so I might as well write a little bit of documentation about it.

My basic requirements are:

This makes it self-running system which won't fall over itself, so let's see how does it look:

root@opl:~# zpool status
  pool: opl
 state: ONLINE
 scrub: resilver completed after 1h59m with 0 errors on Wed Jun  3 15:29:50 2009
config:

        NAME        STATE     READ WRITE CKSUM
        opl         ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            sda2    ONLINE       0     0     0
            sdb2    ONLINE       0     0     0

errors: No known data errors
root@opl:~# zfs list
NAME                           USED  AVAIL  REFER  MOUNTPOINT
opl                            183G  35.7G    21K  /opl
opl/backup                     180G  35.7G    22K  /opl/backup
opl/backup/212052             76.1G  35.7G  8.12G  /opl/backup/212052
opl/backup/212052@2009-05-01  5.69G      -  7.50G  -
opl/backup/212052@2009-05-10  5.69G      -  7.67G  -
opl/backup/212052@2009-05-15  5.57G      -  7.49G  -
opl/backup/212052@2009-05-22  3.54G      -  7.74G  -
opl/backup/212052@2009-05-25  3.99G      -  8.38G  -
opl/backup/212052@2009-05-26  3.99G      -  8.38G  -
...
opl/backup/212052@2009-06-05  3.72G      -  8.09G  -
opl/backup/212052@2009-06-06      0      -  8.12G  -
opl/backup/212056             1.42G  35.7G   674M  /opl/backup/212056
opl/backup/212056@2009-05-30  37.1M      -   688M  -
opl/backup/212056@2009-05-31  47.3M      -   747M  -
opl/backup/212056@2009-06-01  40.9M      -   762M  -
opl/backup/212056@2009-06-02  62.4M      -   787M  -
...
opl/backup/212056@2009-06-05  12.1M      -  1.02G  -
opl/backup/212056@2009-06-06      0      -   674M  -
opl/backup/212226              103G  35.7G  26.8G  /opl/backup/212226
opl/backup/212226@2009-05-05  4.29G      -  26.7G  -
opl/backup/212226@2009-05-10  4.04G      -  26.6G  -
opl/backup/212226@2009-05-15  4.19G      -  26.6G  -
opl/backup/212226@2009-05-22  4.12G      -  26.7G  -
opl/backup/212226@2009-05-23  4.12G      -  26.7G  -
opl/backup/212226@2009-05-24  4.09G      -  26.6G  -
opl/backup/212226@2009-05-25  4.14G      -  26.7G  -
opl/backup/212226@2009-05-26  4.13G      -  26.7G  -
...
opl/backup/212226@2009-06-05  4.20G      -  26.8G  -
opl/backup/212226@2009-06-06      0      -  26.8G  -
opl/clone                      719M  35.7G    25K  /opl/clone
opl/clone/212056-60018         666M  35.7G  1.39G  /opl/clone/212056-60018
opl/clone/212226-60017        53.0M  35.7G  26.7G  /opl/clone/212226-60017
opl/vz                        1.59G  35.7G  43.5K  /opl/vz
opl/vz/private                1.59G  35.7G    22K  /opl/vz/private
opl/vz/private/60014           869M  35.7G   869M  /opl/vz/private/60014
opl/vz/private/60015           488M  35.7G   488M  /opl/vz/private/60015
opl/vz/private/60016           275M  35.7G   275M  /opl/vz/private/60016
There are several conventions here which are useful:
  • pool is named same as machine (borrowing from Debian way of naming LVM volume groups) which makes it easy to export/import pools on different machines (I did run it with mirror over nbd for a while)
  • snapshots names are dates of snapshot for easy overview
  • clones (writable snapshots) are named using combination of backup and new container ID

There are several things which I wouldn't be able to get without zfs:

  • clones can grows as much as they need
  • data is compressed, which increase disk IO as result
  • zfs and zpool commands are really nice and intuitive way to issue commands to filesystem
  • zpool history is great idea of writing all filesystem operations to internal log
  • ability to re-sliver (read/write all data on platters) together with checksums make it robust to disk errors

My mind is just too accustomed to RDBMS engines to accept that I can't have GROUP BY in my shell pipes. So I wrote one groupby.pl.


Aside from fact that it somewhat looks like perl golfing (which I'm somewhat proud of), let's see how does it look:


dpavlin@llin:~/private/perl$ ps axv | ./groupby.pl 'sum:($6+$7+$8),10,count:10,min:($6+$7+$8),max:($6+$7+$8)' | sort -k1 -nr | head -10 | align
440947 /usr/lib/iceweasel/firefox-bin 1 440947 440947
390913 /usr/sbin/apache2 11 22207 39875
180943 /usr/bin/X 1 180943 180943
135279 /usr/bin/pinot-dbus-daemon 1 135279 135279
122254 mocp 2 25131 97123
84887 pinot 1 84887 84887
78279 postgres: 5 10723 21971
70030 /usr/bin/perl 6 6959 15615
50213 /bin/bash 7 6351 7343
49266 /usr/lib/postgresql/8.2/bin/postgres 2 24631 24635

This will display total usage for process, it's name, number of such processes and range of memory usage. We can then use old friend sum.pl to produce console graph, but I already wrote about it.


So, let's move to another example, this time for OpenVZ. Let's see how much memory is each virtual machine using (and get number of processes for free):



$ vzps -E axv --no-headers | ./groupby.pl 'sum:($7+$8+$9*1024),1,count:1'
2209504127 0 265
611768242 212024 38
162484775 212037 19
170797534 212052 38
104853258 212226 26
712007227 212253 21

But wouldn't it be nice to display hostnames instead of VEID numbers? We can, using --join and --on options (which are really backticks on steroids):

$ vzps -E axv --no-headers | ./groupby.pl 'sum:($7+$8+$9*1024),1,count:1' --join 'sudo vzlist -H -o veid,hostname' --on 2
2146263206 0 259
675835528 saturn.ffzg.hr 40
162484775 arh.rot13.org 19
170797534 koha-dev.rot13.org 38
104853258 koha.ffzg.hr 26
712011323 zemlja.ffzg.hr 21

Which brings us to final result:

$ vzps -E axv --no-headers | ./groupby.pl 'sum:($7+$8+$9*1024),1,count:1' --join 'sudo vzlist -H -o veid,hostname' --on 2 | sort -rn | align | ./sum.pl -h
0 260 2105M OOOOOOOOOOOOOOOOOOO 2105M
zemlja.ffzg.hr 21 679M OOOOOO------------------- 2784M
saturn.ffzg.hr 35 512M OOOO-------------------------- 3296M
koha-dev.rot13.org 38 162M O------------------------------ 3459M
arh.rot13.org 19 154M O-------------------------------- 3614M
koha.ffzg.hr 26 99M ---------------------------------- 3714M

So, here you have it: SQL like query language for your shell pipes.

vz-tools

For quite some time, I have been happy user of OpenVZ virtualization. And for almost same amount of time, I have been writing vz-tools, a set of few utilities to make work with OpenVZ a joy:

  • vz-optimize.pl was written first, and it tries to intelligently increase beancounter limits for VE until there is some requires left for virtual machine.
    Best usage would be to run some kind of stress-test on your VE, and then re-run vz-optimize.pl afterwards to allocate enough resources for VE.
  • vz-create.pl is second useful tool which will create new VE from upstream Debian mirror and tweak it a bit (the way I like it)
  • vz-clone.pl is latest addition to set of tools and it allows to create temporary clone of virtual machine for testing (of upgrades?) or QA

While those tools work well for me, they don't have much error checking, and most of parameters in them are actually hard-coded. Since, I am having presentation about virtualization on Linux on HrOUG this year, I will probably improve this tools a bit more.

However, I'm not quite sure if this tools would be of general use to somebody else. If it is, please drop me a note, and I would love to improve them a bit more...