May 2009 Archives

So, you think that your network is slow. But, how would you test that? You can feel that speed between different hosts is different, but what you need some data to find problem. Here is my take on this...

First, select subset of machines to test network speed on and install netpipe-tcp. Then run NPtcp on target machines and NPtcp -h hostname -u 1048576 -o /tmp/hostname.np on machine from which you are testing bandwidth. Several iterations later, you will have a bunch of *.np files which are ready for analysis.

You can do it by hand, but this handy perl script will convert *.np files into graphviz's dot file. Which looks like this: netpipe-grahviz.png

GraphViz will make it's auto-layout magic and just looking at picture you will immediately notice that there are 100Mbit/s link somewhere in-between machines... Pictures can really replace thousands of words...

For quote some time I wanted to try PXE booting. After all, I did wrote bootp and tftp server for ADSL modems, so how complicated can it be?

I decided to use dnsmasq as server, and added following configuration options to dnsmasq:

enable-tftp
tftp-root=/srv/sysadmin-cookbook/recepies/pxe/tftpboot/
dhcp-boot=pxelinux.0
Then, I created tftpboot from upstream Debian netboot:
wget -nc ftp.hr.debian.org/debian/dists/lenny/main/installer-i386/current/images/netboot/netboot.tar.gz \
&& mkdir tftpboot && cd tftpboot && tar xvfz ../netboot.tar.gz
It seemed all nice and well, so I decided to try it using Eee PC 701. And it didn't work. I didn't have any network link, tshark -i eth0 didn't reported any network traffic and all suggested that BIOS didn't turn power on network card.

I even tried lastest bios upgrade but it didn't help. I was quite sure that configuration is correct (it's so simple after all) and tried to boot ThinkPad. Which worked...

So, I had a PXE environment which worked, just not with Eee PC. Fortunately, there is alternative to buggy PXE boots: gPXE. It comes with bootable USB version which to my amazement worked perfectly on Eee PC. If you want to know all glory defailes about gPXE watch this video. It well worth your time...

As you might guessed by now, I played with file-systems for backup appliance So, against my good judgment, I decided to try btrfs to see how ready is it to replace zfs-fuse configuration with real in-kernel file-system (zfs-fuse is not slow, because disks are much slower than any peace of software).

So far, I found following annoyances in brtrs:

  1. snapshots can't be removed (I'm doing incremental forever backups, so this is not show-stopper)
    You can remove all files in snapshot directory, but not directory itself. I would guess that removing files would just increase disk space, because it's copy-on-write filesystem, but I didn't test that.
  2. there is no indication which directory is snapshot (if you didn't wrote down in log which is snapshot, you are out of luck)
  3. it seeks quite a lot (there is 40-70% wait time in vmstat while running rsync which I guess is seek, because there is no block input/output operations at same time)
  4. it will oops your (Debian 2.6.29-2-686) kernel:
    Message from syslogd@klin at May 16 00:42:31 ...
     kernel:[ 4057.994566]  [<c0119e0f>] kmap_atomic_prot+0xbd/0xdd
    Message from syslogd@klin at May 16 00:42:31 ...
     kernel:[ 4057.994576]  [<c0119d30>] kunmap_atomic+0x58/0x7a
    Message from syslogd@klin at May 16 00:42:31 ...
     kernel:[ 4057.994586]  [<f83a61a2>] btrfs_cow_block+0x134/0x13d [btrfs]
    Message from syslogd@klin at May 16 00:42:31 ...
     kernel:[ 4057.994608]  [<f83a8b4b>] btrfs_search_slot+0x1f0/0x622 [btrfs]
    Messag./pull-snapshot-backup.sh: line 8:  4316 Segmentation fault      rsync -ravHC --numeric-ids --delete $from:/mnt/vz-backup/private/$1/ /$pool/$1/
    
    dmesg-btrfs-bug.txt

After that I concluded that warning about alpha state of btrfs is there with a reason. I didn't fully appreciate Theodore Ts's warning about development status of btrfs until I got kernel oops.

Let me first explain background to the story: you want a system to implement distributed printing. It has local accounts (it can fetch users from LDAP) and does routing of printed documents to printers which have card readers so that users can pick up printouts after they identify with a card.

Sounds complicated? O.K., let's consider that we have a system and we are trying to deploy it. At this point it doesn't metter did you already paid for it or if it's open or close source. Really.

2008-05-08_virtworkshop.jpg

You are trying to configure it. It's Java (because it's enterprise system) and it seems that most of things are configured using .ini files. After four weeks of trying to make it work, you have following facts:

  • configuration options are not used in all parts of system, for example some configuration options exists but aren't using all over the system (in this case, although there is objectclass for LDAP entries, and it's changed to HrEduPerson, system sometimes uses Person)
  • some configuration options have special limits within application logic: in our case, if we turn flag to disable negative credits on cards, system doesn't allow users to use system without 10 credits. This doesn't make sense, because there is administration interface for this option, and it shows 0.00
First, let me emphasize that this problem might be same for both types of software. Every software is reliable only in environment in which it's tested, and I know they very well from my experience with Open Source. However...

If system is closed product without source

You can exchange several e-mail with help-desk which is really first level customer support who is more or less working using cookbook. I have seen such help-desks at both previous jobs, so I don't really expect deep technical expertise about application. However, that resulted in painful try-and-error process because configuration options are somewhat cryptic and sparsely documented.

If system has source available

If I could look into source of application I could fix configuration option names. Or improve documentation. Probably even fix problems that I found and submit patch to improve upstream project (or pick another one because this one just isn't worth it).

So the real moral of this story is: closed source projects limits your flexibility. It will drain your time and bring you half-working solution without ability to fix it yourself. I really honestly cannot understand why someone would like to choose that.

Closed hardware - open source driver

I also have another example of company within same industry (printing) with closed hardware which at least got driver part right: Dualys has source code for CUPS driver. I still haven't found time to try it out, but I was afraid that making custom card printer will be more work than syncing closed source commercial application with LDAP, right?

Freedom as right of the user

Isn't it funny that Richard Stallman's Crusade for Free Software started with a printer?

Izgleda da me svake godine malo iznenadi, ali DORS/CLUC je i ove godine bio ponešto različit od prošle. Ako to moram svesti u jednu rečenicu: postoji određen broj ljudi u Hrvatskoj koji stvarno razumiju teme o kojima se priča na ovoj konferenciji. Jedini problem je da se zapravo svi slažemo, pa su diskusije možda premalo kritične :-)

Ove godine imali smo seriju zanimljivih lightning talk-ova od kojih sam ja održao Sve što ste htjeli znati o RFID-u a niste se usudili pitati... u 5 min.

Slijdeći dan sam pokušao zainteresirati publiku da oslobode neki komadić hardware-a. Ako vam je moje predavanje bilo zanimljivo, vjerojatno će Vam se svidjeti i predavanje sa ovogodišnjeg FOSDEM-a o tome kako je napisan Gnash (slobodan flash player).

My point of view

First, let me explain my position. I was working for quite a few years in big corporation, and followed EMC storage systems (one from end of of last century and improvement that Clarion did on our production SAP deployment). I even visited EMC factory in Cork, Ireland, and it was very eye-opening experience. They claim that 95% of customers who visited factory did buy EMC storage, and I believe them (we did upgrade to Clarion btw).

In my Linux based deployments on HP, Compaq and IBM hardware I did various crazy RAID configurations (RAID5 across disks on controller and then stripe across other controller, for example). Those where the easy parts: you got RAID controller with DRAM cache (~256Mb) and some kind of battery backup which greatly improved write performance.

Later on in CARNet we had HP EVA storage which proved quite flaky. I heard from friend in one enterprise deployment that they use them only for testing. And you know, it's just shelf of disks with redundant controllers and fiber interface...

In the mean time, on Linux software RAID front, I used md implementation RAID1 and RAID5 back in the days when Linux distributions couldn't handle that.

Solid state drives

However, solid state drives changed a lot of that. I still haven't had pleasure to use Intel SSD which are supposed to be good, but USB sticks are also flash storage, but with quaky characteristics.

This particular one is ID 0951:1603 Kingston Technology Data Traveler 1GB/2GB Pen Drive as reported by Linux, but in fact 8Gb model which seem to have 128Mb of memory which is writable at about 6Mb/s and after that write speed drops to 45K/s.

On the other hand, there is ZFS on FUSE project which enables some really interesting applications of Sun's (and now Oracle) file-system. I do have to mention Sun at this point. Ever since I heard about Oracle's acquisition of Sun, I have wondered what will happen with ZFS. I might even suspect that ZFS is the main reason why Oracle bought Sun. Let me explain...

Sunshine Oracle

If you look at database market (where Oracle is), the only interesting thing to improve relational databases is to make them extremely fast. And that revolution is already here. Don MacAskill from SmugMug makes compelling case about performance of SSD storage. If you don't believe words, watch this video from 24:50 to see solutions to MySQL storage performance problem: hardware!. Sun's hardware. Do you think that Oracle didn't noticed that?

Enterprise storage cheaply

Did you watched the video? I really don't agree that it's hardware. Common! It's Opteron boxes with custom built SSD disks optimized for write speed. SSD with super-capacitors instead of batteries in old RAID controller.

But, to make it really fun, I will try to re-create at least some of those abilities using commodity hardware in my university environment. I have Dell's OptiPlex boxes which come loaded with a lot of goodies to put together a commodity storage cluster:

  • Intel
  • Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz
  • 3Gb RAM
  • 2 SATA disks with ~80Mb/s of read/write performance
  • multi-card reader and 8 USB slots
  • fake software RAID on Intel chipset (supported by dmraid but even it's documentation suggests not to use it)

ZFS

Why ZFS? Isn't btrfs way to go? For this particular application, I don't think so. Let me list features of ZFS which excite me:

  • ability to store log to separate (mirror) device (SSD, USB sticks if that helps)
  • scrub: read all bytes on disk and rewrite it (beats smartctl -t long because it also re-allocates bad blocks, I've seen 80Mb/s scrub)
  • balancing of IO over devices (I will use this over nbd to split mirror between machines for fail-over)
  • arbitrary number of copies (nice for bigger clusters of storage machines)
  • nice snapshots which display it's size and can be cloned to writable ones
  • snapshot send/receive to make off-site backup copies
  • L2ARC - balance read and write cache over SSD devices with different characteristics (USB sticks have fast read and slow write, so they might be good fit)

You might think of it as git with POSIX file-system semantics.

But, it's in user space, you say, it must be slow! It isn't. Really. Linux user-space is much faster than disk speed and having separate process is nice for monitoring purposes. File-system overhead gets counted into user time, not system, so system time is clear indicator of driver (hardware) activity and not file-system overhead.

I have most parts of this setup ready, and I'm using it to backup OpenVZ containers. So, I'm running OpenVZ kernel and I can even make virtual machines from backup snapshots to recover into some point in time. After I finish this setup, expect a detailed guide (it will probably be part of my upcoming virtualization workshop as alternative to LVM).

About this Archive

This page is an archive of entries from May 2009 listed from newest to oldest.

April 2009 is the previous archive.

June 2009 is the next archive.

Find recent content on the main index or look in the archives to find all content.

Pages

  • pics
OpenID accepted here Learn more about OpenID
Powered by Movable Type 5.04