August 2009 Archives

As you might know by now, I have been really struck by simplicity of CouchDB at last year's OSCON. From then, we got couchapp which is great idea of hosting CouchDB views on file-system for easy maintenance.

So good in fact, that I re-wrote couchapp in perl called design-couch.pl. I needed to deviate a bit from original design (one _design document per application) because PXElator data which I store in CouchDB is... data...

I have been introduced to Relational databases back at university and since then I have been using PostgreSQL for all my database needs. But, this time, I have dumps from commands, SOAP interfaces, syslog messages and arbitrary audit events generated all over the code. I didn't want to think about structure up-front, and View Cookbook for SQL Jockeys convinced me I don't have to, but I decided to make few simple rules to get me started:

  • Create URLs using humanly readable timestamps (yyyy-mm-dd.HH:MM:SS.package.ident) which allows easy parsing in JavaScript (if needed), and ensures that all entries are sorted by time-stamp
  • Augment each document with single new key package (perl keyword on top of each module). It will have sub-keys time decimal time-stamp, name of package, caller sub which called CouchDB::audit and line from which it's called
  • Single _design document for output from one package (which is directory on file-system) just because it easy browsable in Futon.
CouchDB_select_view.png

So, are those rules enough to forget about relational algebra and relax on the couch? Let's take a look at _design/amt/ip,SystemPowerState-count. I applied here almost SQL-ish naming convention - column names, separated by commas then dash - and output column(s).

Map

function(doc) {
  if ( doc.package.name == 'amt'
  && doc.package.caller == 'power_state')
  emit([doc.ip,doc.SystemPowerState],1);
}

Reduce

function (k,v) {
  return sum(v)
}

When run, this map/reduce queries produce result like this:

KeyValue
["172.16.10.200", null]4
["172.16.10.16", "5"]21
["172.16.10.16", "0"]9
["172.16.10.16", null]8
["10.60.0.196", "0"]6
["10.60.0.195", "256"]8
["10.60.0.194", "256"]11
["10.60.0.193", "256"]3

So far, so good. But what if I wanted to average all ping round trip times for each ip?

If you where using SQL, answer would be:

select ip,avg(rtt) from ping group by ip
However, evil rereduce roars it's head here:

Map

function(doc) {
  if ( doc.package.name == 'ping' )
   emit(doc.ip, doc.rtt)
}

Reduce

function (k,values,rereduce) {
  if (rereduce) {
    var total_sum = 0;
    var total_length = 0;
    for (var i = 0; i < values.length; i++) {
      total_sum += values[i][0];
      total_length += values[i][1];
    }
    return [total_sum, total_length];
  } else {
    return [sum(values), values.length];
  }
}

Since we are called incrementally, we can't average averages. We need to collect total sum and number of elements and perform final computation on client:

KeyValue
"193.198.212.4"[779.0038585662847, 9]
"193.198.212.228"[902.6305675506585, 10]
"192.168.1.61"[906.698703765869, 11]
"192.168.1.34"[995.9852695465088, 11]
"192.168.1.3"[316.55669212341303, 6]
"192.168.1.20"[506.162643432617, 8]
"192.168.1.2"[473.91605377197277, 11]
"192.168.1.13"[649.2500305175784, 11]
"172.16.10.10"[49.9579906463623, 1]
"172.16.10.1"[250.78511238098127, 15]
"127.0.0.1"[62.57653236389161, 16]
"10.60.0.94"[81.6218852996826, 2]
"10.60.0.93"[186.49005889892578, 6]
"10.60.0.92"[386.7535591125485, 5]
"10.60.0.91"[1070.863485336304, 9]
"10.60.0.90"[428.4689426422117, 10]

If you manage to wrap your head around this, you are ready to dive into CouchDB.

This weekend we where in Split on Ništa se neće dogoditi event and I did presetation about first three weeks of PXElator development which can be used as gentle introduction into this project. So, here we go...

Introduction

PXElator is just a peace of puzzle which aims to replace system administration with nice declarative programs in perl. It's a experiment in replacing my work with reusable perl snippets.

It tries to solve following problems:

  • support deployment of new physical or virtual machines (ip, hostname, common configuration)

  • maintain documentation about changes on systems, good enough to be used for disaster recovery (or deployment of similar system)

  • configure systems in small chunks (virtual or containers) for better management and resource tracking using normal system administration tools (but track those changes)

  • provide overview and monitoring of network segment and services on it with alerting and trending

Deployment of new machines

What is really machine? For PXElator, it's MAC and IP address and some optional parameters (like hostname). It's stored on file-system, under conf/server.ip/machine.ip/hostname and can be tracked using source control if needed.

This is also shared state between all daemons implementing network protocols:

  • DHCP (with PXE support)

  • TFTP (to deliver initial kernel and initrd using pxelinux)

  • HTTP (to provide alternative way to fetch files and user interface)

  • DNS (we already have data)

  • syslog

  • AMT for remote management

Having all that protocols written in same language enables incredible flexibility in automatic configuration. I can issue command using installation which has only ping because I can have special DNS names which issue commands.

But, to get real power, we need to aggregate that data. I'm currently using CouchDB from http://couchdb.apache.org/ to store all audit data from all services into single database.

I wanted simple way to write ad-hoc queries without warring about data structure too much. At the end, I opted for audit role of data, and used 1 second granularity as key when storing data. Result of it is that 133 syslog messages from kernel right after boot you will create single document with 133 revisions instead of flooding your database.

It would be logical to plug RRDtool http://oss.oetiker.ch/rrdtool/ somewhere here to provide nice graphs here, but that is still on TODO list.

End user scenarios:

  • Take a new machine, plug it into network, boot it from network and configure for kiosk style deployment with Webconverger available at http://webconverger.com/. Kiosk should automatically turn on every morning at 7:30 and turn off at 20:30.

  • Boot virtual machine (with new ip and hostname) from backup snapshot for easy recovery or testing

  • Boot machine from network into fully configurable (writable) system for quick recovery or dedicated machine. This is implemented using NFS server with aufs read-write overlay on top of debootstrap base machine.

Disaster recovery documentation for me, two years later

I have been trying to write useful documentation snippets for years. My best effort so far is Sysadmin Cookbook at http://sysadmin-cookbook.rot13.org/ a set of semi-structured shell scripts which can be executed directly on machines.

This part isn't yet integrated into PXElator, but most of the recipe will become some kind of rule which you can enforce on some managed machine.

End user scenario:

  • Install that something also on this other machine

Configure system like you normally would but track changes

This is basically requirement to track configuration changes. Currently, this feature falls out of writable snapshot over base system which is read-only. Overlay data is all custom configuration that I did!

Tracking changes on existing machines will be implemented scp to copy file on server into hostname/path/to/local/file directory structure. This structure will be tracked using source control (probably git as opposed to subversion which PXElator source uses) and cron job will pull those files at some interval (daily, hourly) to create rsync+git equivalent of BackupPC http://backuppc.sourceforge.net for this setup.

It's interesting to take a look how it's different from Puppet and similar to cfengine3:

  • All data is kept in normal configuration files on system -- you don't need to learn new administration tools or somehow maintain two sources of configuration (in configuration management and on the system)

  • Introspect live system and just tries to apply corrections if needed which is similar to cfengine3 approach.

End user scenario:

  • Turn useful how-to into workable configuration without much effort

Provide overview and monitoring

This falls out from HTTP interface and from collecting of data into CouchDB. For now, PXElator tries to manage development environment for you, opening xterms (with screen inside for logging and easy scrollback) in different colors, and enable you to start Wireshark on active network interfaces for debugging.

Let's assume that you want to create virtual network which spans sites (or continents :-). While we are at it, let's assume that you want to have layer 2 connectivity (because you want to run just single DHCP server for example).

At first, it seemed logical to use Virtual Distributed Ethernet for which kvm has support. However, this involves running multiple processes to support nodes on network, and it's really virtual -- you can't use familiar Linux tools (like brctl or arp) to configure it. And it's connected over ssh anyway, so why to add unnecessary complexity to setup?

Since we will use ssh to transfer traffic anyway (it easiest hole to drill over firewalls and you probably already have it for administration anyway), why do we need another layer of software in between, with new commands to learn if we already know how to make it using plain old Linux brctl?

So, let's take another look at ssh, especially option Tunnel=ethernet which provides Ethernet bridging between two tap devices. As I wrote before, ssh have point-to-point links using tun device which is great solution if you want to connect two networks on IP level using routing. However, tap devices provide access to Ethernet layer from user-space (so ssh, kvm, VDE and various others user-land programs can send and receive Ethernet packets). However, finding information on internet how to setup ssh to use tap devices is nowhere to be found and motivated me for this blog post.

Let's assume that we have two machines in following configuration:

  • t61p - laptop at home behind DSL link and nat which wants to run kvm virtual machine in virtual network 172.16.10.0/24
  • t42 - desktop machine at work which have network bridge called wire which has 172.16.10.0/24 network which provides network booting services
So, we need ethernet tunneling to remote client.
# install tunctl
dpavlin@t61p:/virtual/kvm$ sudo apt-get install uml-utilities

dpavlin@t61p:/virtual/kvm$ sudo tunctl -u dpavlin -t kvm0
Set 'kvm0' persistent and owned by uid 1000

dpavlin@t61p:/virtual/kvm$ kvm -net nic,macaddr=52:54:00:00:0a:3d -net tap,ifname=kvm0,script=no -boot n
This doesn't really boot our kvm from network because we didn't connect it together. Now we need to enable tunnels on t42 and setup remote tap device
dpavlin@t42:~$ grep -v PermitTunnel /etc/ssh/sshd_config > /tmp/conf
dpavlin@t42:~$ ( grep -v PermitTunnel /etc/ssh/sshd_config ; echo PermitTunnel yes ) > /tmp/conf
dpavlin@t42:~$ diff -urw /etc/ssh/sshd_config /tmp/conf
--- /etc/ssh/sshd_config        2009-04-20 12:50:27.000000000 +0200
+++ /tmp/conf   2009-08-14 20:42:40.000000000 +0200
@@ -75,3 +75,4 @@
 Subsystem sftp /usr/lib/openssh/sftp-server
 
 UsePAM yes
+PermitTunnel yes

# install and restart ssh
dpavlin@t42:~$ sudo mv /tmp/conf /etc/ssh/sshd_config
dpavlin@t42:~$ sudo /etc/init.d/ssh restart
Restarting OpenBSD Secure Shell server: sshd.
Now we can connect two machines using ssh ethernet tunnel
dpavlin@t61p:/virtual/kvm$ sudo ssh -w 1:1 -o Tunnel=ethernet root@10.60.0.94

t42:~# ifconfig tap1
tap1      Link encap:Ethernet  HWaddr fa:35:cb:9e:87:60  
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:500 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

t42:~# ip link set tap1 up
t42:~# brctl addif wire tap1
t42:~# brctl show wire
bridge name     bridge id               STP enabled     interfaces
pan0            8000.000000000000       no
wire            8000.006097472681       no              eth2
                                                        eth3
                                                        tap0
                                                        tap1
                                                        tap94

t42:~# dmesg | grep tap1
[284844.064953] wire: port 5(tap1) entering learning state

t42:~# tshark -i wire
This created tap1 devices on both machines and added one on t42 to bridge and left us with dump from tshark on wire bridge.

Now we need to setup virtual bridge on t61p to connect ssh tunnel and kvm tap device.

dpavlin@t61p:/virtual/kvm$ sudo ifconfig tap1
tap1      Link encap:Ethernet  HWaddr 52:c5:f8:64:30:d4  
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:500 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

dpavlin@t61p:/virtual/kvm$ sudo brctl addbr virtual
dpavlin@t61p:/virtual/kvm$ sudo brctl addif virtual kvm0
dpavlin@t61p:/virtual/kvm$ sudo brctl addif virtual tap1

dpavlin@t61p:/virtual/kvm$ sudo brctl show
bridge name     bridge id               STP enabled     interfaces
pan0            8000.000000000000       no
virtual         8000.4e1537af6cdc       no              kvm0
                                                        tap1

dpavlin@t61p:/virtual/kvm$ sudo ip link set kvm0 up
dpavlin@t61p:/virtual/kvm$ sudo ip link set tap1 up
dpavlin@t61p:/virtual/kvm$ sudo ip link set virtual up

dpavlin@t61p:/virtual/kvm$ dmesg | grep virtual
[31141.669760] virtual: port 1(kvm0) entering learning state
[31152.288025] virtual: no IPv6 routers present
[31156.668088] virtual: port 1(kvm0) entering forwarding state
[31211.699928] virtual: port 2(tap1) entering learning state
[31226.696070] virtual: port 2(tap1) entering forwarding state
dpavlin@t61p:/virtual/kvm$ kvm -net nic,macaddr=52:54:00:00:0a:3d -net tap,ifname=kvm0,script=no -boot n
This will boot our kvm using ethernet bridge from remote server using nothing more than brctl and ssh !

If you wanted even more lightweight solution to same problem, you might look into EtherPuppet.

On related note, if your kvm Windows XP machines stopped working with upgrade to Debian kernel 2.6.30-1-686, just upgrade to 2.6.30-1-686-bigmem (even if you don't have more memory) and everything will be o.k.

As you might know by now, I was debugging memory related problems on one of my systems recently and concluded that normal output from Linux commands are more or less inaccurate. If you want to know why, take a look at Matt Mackall presentation at ELC2009: Visualizing Process Memory or watch following video:

Convinced? So, hop at smem page, compile user-land part and start really tracking your memory usage, let's compare:

dpavlin@t61p:/rest/cvs/smem$ free
             total       used       free     shared    buffers     cached
Mem:       4081400    3882476     198924          0     142904    2731480
-/+ buffers/cache:    1008092    3073308
Swap:      8209172       7492    8201680

dpavlin@t61p:/rest/cvs/smem$ ./smem -w -t
Area                           Used      Cache   Noncache 
firmware/hardware                 0          0          0 
kernel image                      0          0          0 
kernel dynamic memory       2927016    2845456      81560 
userspace memory             954900     119368     835532 
free memory                  199484     199484          0 
----------------------------------------------------------
                            4081400    3164308     917092 
Just a few quick notes if you didn't watched whole video carefully:
  • needs kernel 2.6.27 or newer
  • it can work on archived data (from cron in my example usage)
  • userspace cache is backed by file on disk
  • it's a python script which requires matplotlib to create graphs so it's for local reporting

About this Archive

This page is an archive of entries from August 2009 listed from newest to oldest.

July 2009 is the previous archive.

September 2009 is the next archive.

Find recent content on the main index or look in the archives to find all content.

Pages

  • pics
OpenID accepted here Learn more about OpenID
Powered by Movable Type 5.04