This is a story about our mail server which is coming close to it's disk space capacity:
root@mudrac:/home/prof/dpavlin# df -h Filesystem Size Used Avail Use% Mounted on /dev/vda1 20G 7.7G 11G 42% / /dev/vdb 4.0T 3.9T 74G 99% /home /dev/vdc 591G 502G 89G 85% /home/stud
You might say that it's easy to resize disk and provide more storage, but unfortunately it's not so easy. We are using ganeti for our virtualization platform, and current version of ganeti has limit of 4T for single drbd disk.
This can be solved by increasing third (vdc) disk and moving some users to it, but this is not ideal. Another possibility is to use dovecot's zlib plugin to compress mails. However, since our Maildir doesn't have required S=12345 as part of filename to describe size of mail, this solution also wasn't applicable to us.
Installing lvm would allow us to use more than one disk to provide additional storage, but since ganeti already uses lvm to provide virtual disks to instance this also isn't ideal.
OpenZFS comes to rescue
Another solution is to use OpenZFS to provide multiple disks as single filesystem storage, and at the same time provide disk compression. Let's create a pool:
zpool create -o ashift=9 mudrac /dev/vdb zfs create mudrac/mudrac zfs set compression=zstd-6 mudrac zfs set atime=off mudracWe are using ashift of 9 instead of 12 since it uses 512 bytes blocks on storage (which is supported by our SSD storage) that saves quite a bit of space:
root@t1:~# df | grep mudrac Filesystem 1K-blocks Used Available Use% Mounted on mudrac/mudrac 3104245632 3062591616 41654016 99% /mudrac/mudrac # ashift=12 m2/mudrac 3104303872 2917941376 186362496 94% /m2/mudrac # ashift=9This is saving of 137Gb just by choosing smaller ashift.
Most of our e-mail are messages kept on server, but rarely accessed. Because of that I opted to use zstd-6 (instead of default zstd-3) to compress it as much as possible. But, to be sure it's right choice, I also tested zstd-12 and zstd-19 and results are available below:
LEVEL | USED | COMP | H:S |
---|---|---|---|
zstd-6 | 2987971933184 | 60% | 11:2400 |
zstd-12 | 2980591115776 | 59% | 15:600 |
zstd-19 | 2972514841600 | 59% | 52:600 |
bullseye kernel for zfs and systemd-nspawn
To have zfs, we need recent kernel. Instead of upgrading whole server to bullseye at this moment, I decided to boot bullseye with zfs and start unmodified installation using systemd-nspawn. This is easy using following command line:
systemd-nspawn --directory /mudrac/mudrac/ --boot --machine mudrac --network-interface=eth1010 --hostname mudracbut it's not ideal for automatic start of machine, so better solution is to use machinectl and systemd service for this. Converting this command-line into nspawn is non-trivial, but after reading man systemd.nspawn configuration needed is:
root@t1:~# cat /etc/systemd/nspawn/mudrac.nspawn [Exec] Boot=on #WorkingDirectory=/mudrac/mudrac # ln -s /mudrac/mudrac /var/lib/machines/ # don't chown files PrivateUsers=false [Network] Interface=eth1010Please note that we are not using WorkingDirectory (which would copy files from /var/lib/machines/name) but instead just created symlink to zfs filesystem in /var/lib/machines/.
To enable and start container on boot, we can use:
systemctl enable systemd-nspawn@mudrac systemctl start systemd-nspawn@mudrac
Keep network device linked to mac address
Predictable network device names which bullseye uses should provide stable network device names. This seems like clean solution, but in testing I figured out that adding additional disks will change name of network devices. Previously Debian used udev to provide mapping between network interface name and device mac using /etc/udev/rules.d/70-persistent-net.rules. Since this is no longer the case, solution is to define similar mapping using systemd network like this:
root@t1:~# cat /etc/systemd/network/11-eth1010.link [Match] MACAddress=aa:00:00:39:90:0f [Link] Name=eth1010
Increasing disk space
When we do run out of disk space again, we could add new disk and add it to zfs pool using:
root@t2:~# zpool set autoexpand=on mudrac root@t2:~# zpool add mudrac /dev/vdcThanks to autoexpand=on above, this will automatically make new space available. However, if we increase existing disk up to 4T new space isn't visible immediately since zfs has partition table on disk, so we need to extend device to use all space available using:
root@t2:~# zpool online -e mudrac vdb
zfs snapshots for backup
Now that we have zfs under our mail server, it's logical to also use zfs snapshots to provide nice, low overhead incremental backup. It's as easy as:
zfs snap mudrac/mudrac@$( date +%Y-%m-%d )in cron.daliy and than shipping snapshots to backup machine. I did look into existing zfs snapshot solutions, but they all seemed a little bit too complicated for my use-case, so I wrote zfs-snap-to-dr.pl which copies snapshots to backup site.
To keep just and two last snapshots on mail server simple shell snippet is enough:
zfs list -r -t snapshot -o name -H mudrac/mudrac > /dev/shm/zfs.all tail -2 /dev/shm/zfs.all > /dev/shm/zfs.tail-2 grep -v -f /dev/shm/zfs.tail-2 /dev/shm/zfs.all | xargs -i zfs destroy {}Using shell to create and expire snapshots and simpler script to just transfer snapshots seems to me like better and more flexible solution than implementing it all in single perl script. In a sense, it's the unix way of small tools which do one thing well. Only feature which zfs-snap-to-dr.pl has aside from snapshot transfer is ability to keep just configurable number of snapshots on destination which enables it to keep disk usage under check (and re-users already collected data about snapshots). This was interesting journey. In future, we will migrate mail server to bullseye and remove systemd-nspawn (it feels like we are twisting it's hand using it like this). But it does work, and is simple solution which will come handy in future.