Dobrica Pavlinušić's Weblog / Blog

Stable ip on usb interface with changing mac

Taming the PinePhone's Fickle USB Ethernet

I use my PinePhone with postmarketOS as a mini-modem and SSH target. The goal is simple: plug the phone into my Debian laptop via USB, get a predictable IP address on it, and SSH in. This should be trivial. It was not.

If I rebooted the phone while it was plugged in, the interface would die and not come back correctly.

The first step is kernel log.

# dmesg -w

Plugging in the phone revealed the first layer of the problem. It doesn't just connect once. It connects, disconnects, and then connects again about 15-20 seconds later.

[Sun Jul 13 15:33:28 2025] usb 1-9: new high-speed USB device number 36 using xhci_hcd
[Sun Jul 13 15:33:28 2025] usb 1-9: New USB device found, idVendor=18d1, idProduct=d001
[Sun Jul 13 15:33:28 2025] cdc_ncm 1-9:1.0 enx2e50df0e14da: renamed from eth0
...
[Sun Jul 13 15:33:45 2025] usb 1-9: USB disconnect, device number 36
...
[Sun Jul 13 15:33:46 2025] usb 1-9: new high-speed USB device number 37 using xhci_hcd
[Sun Jul 13 15:33:47 2025] usb 1-9: New USB device found, idVendor=18d1, idProduct=d001
[Sun Jul 13 15:33:47 2025] cdc_ncm 1-9:1.0 enxde9ceb943c18: renamed from eth0

Notice two things:

The device connects as number 36, then disconnects, then reconnects as number 37. This is the phone's OS re-initializing the USB stack after boot.
The kernel first registers a generic eth0, which is then immediately renamed to a "predictable" name like enx2e50df0e14da.

And to make matters worse, the MAC address is different on each reconnection. Any static configuration in /etc/network/interfaces based on a MAC address is doomed to fail.

The clear solution is to use a udev rule to act when the device appears. The stable identifier we have is the USB vendor and product ID, which lsusb confirms:

$ lsusb | grep Google
Bus 001 Device 037: ID 18d1:d001 Google Inc. Nexus 4 (fastboot)

(It identifies as a Nexus 4 in this mode, which is fine.)

My first attempt was a simple udev rule.

# /etc/udev/rules.d/99-pmos-network.rules (ATTEMPT 1 - WRONG)
ACTION=="add", SUBSYSTEM=="net", ATTRS{idVendor}=="18d1", RUN+="/usr/local/bin/pm-net-configure.sh %k"

This failed because of a race condition. The add action fires the moment eth0 is created, but before it's renamed to enx.... My script would be told to configure eth0, which ceased to exist a millisecond later.

The key to solving udev timing issues is to stop guessing and start observing.

# udevadm monitor --environment --udev

Running this while plugging in the phone produces a firehose of information. After the final reconnection, deep in the output, was the golden ticket:

UDEV  [8215599.005027] move /devices/pci.../net/enxde9ceb943c18 (net)
ACTION=move
DEVPATH=/devices/pci.../net/enxde9ceb943c18
SUBSYSTEM=net
INTERFACE=enxde9ceb943c18
IFINDEX=39
ID_VENDOR_ID=18d1
ID_MODEL_ID=d001
...

The system generates a move event when the interface is renamed. This event is perfect. It only happens after the rename, and it contains both the final interface name (%k or $env{INTERFACE}) and the USB device IDs we need for matching.

This leads to the final, correct, and surprisingly simple udev rule.

The Final Solution

1. The Udev Rule

This single rule triggers at the exact moment the interface is renamed to its stable name.

Create /etc/udev/rules.d/99-pmos-network.rules:

# Trigger on the network interface "move" (rename) event for the PinePhone.
# This avoids all race conditions with initial device naming.
ACTION=="move", SUBSYSTEM=="net", ENV{ID_VENDOR_ID}=="18d1", ENV{ID_MODEL_ID}=="d001", RUN+="/usr/local/bin/pm-net-configure.sh %k"

2. The Configuration Script

This is the script the udev rule calls. The %k in the rule passes the correct interface name (e.g., enxde9ceb943c18) as the first argument.

Create /usr/local/bin/pm-net-configure.sh:

#!/bin/sh
set -e

DEV="$1"
IP_ADDR="172.16.42.2/24"
PEER_IP="172.16.42.1"
LOG_FILE="/tmp/pmos_net_config.log"

# Simple logging to know what's happening
echo "---" >> "$LOG_FILE"
echo "$(date): udev 'move' event on '$DEV'. Configuring." >> "$LOG_FILE"

# Give the interface a second to settle, then bring it up and set the IP.
sleep 1
ip link set dev "$DEV" up
ip addr add "$IP_ADDR" dev "$DEV"

echo "$(date): Successfully configured $DEV" >> "$LOG_FILE"

And make it executable:

# chmod +x /usr/local/bin/pm-net-configure.sh

3. Reload and Test

Tell udev to pick up the new rule.

# udevadm control --reload

Now, reboot the PinePhone. It will do its connect/disconnect dance. After the second connection, the move event will fire our rule.

# ip a
...
39: enxde9ceb943c18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ...
    link/ether de:9c:eb:94:3c:18 brd ff:ff:ff:ff:ff:ff
    inet 172.16.42.2/24 scope global enxde9ceb943c18
       valid_lft forever preferred_lft forever
...

# ping -c 2 172.16.42.1
PING 172.16.42.1 (172.16.42.1) 56(84) bytes of data.
64 bytes from 172.16.42.1: icmp_seq=1 ttl=64 time=0.885 ms
64 bytes from 172.16.42.1: icmp_seq=2 ttl=64 time=0.672 ms

It just works. Every time.

Protect Koha from web crawlers

We live in troubling times, as web crawlers have become so prevalent in internet traffic that they can cause denial-of-service attacks on Koha instances.

Simplest possible way to prevent this is following rule:

    <LocationMatch "^/cgi-bin/koha/(opac-search\.pl|opac-shelves\.pl|opac-export\.pl|opac-reserve\.pl)$">
        # Block requests without a referer header
        RewriteEngine On
        RewriteCond %{HTTP_REFERER} ^$
        RewriteRule .* - [F,L]

        # Optional: Return a 403 Forbidden status code
        ErrorDocument 403 "Access Forbidden: Direct access to this resource is not allowed."
    </LocationMatch>

This helps to mitigate problems like this:

dspace shibboleth config and debug

Configuring shibboleth is always somewhat confusing for me, so I decided to write this blog post to document how configuration for dspace is done and debugged.

This information is scattered over documentation and dspace-tech mailing list so hopefully this will be useful to someone, at least me if I ever needed to do this again.

First step is to install mod-shib for apache: apt install libapache2-mod-shib Now there are two files which have to be modified with your information, /etc//shibboleth/shibboleth2.xml which defines configuration and /etc/shibboleth/attribute-map.xml to define which information will be passwd from shibboleth to application.

attribute-map.xml

Here we have to define headers which dspace expects, so it can get information from upstream idenitity provider.

diff --git a/shibboleth/attribute-map.xml b/shibboleth/attribute-map.xml index 1a4a3b0..a8680da 100644 --- a/shibboleth/attribute-map.xml +++ b/shibboleth/attribute-map.xml @@ -163,4 +163,34 @@ </Attribute> --> +  + <Attribute name="urn:oid:0.9.2342.19200300.100.1.1" id="SHIB-NETID"/> + <Attribute name="urn:mace:dir:attribute-def:uid" id="SHIB-NETID"/> + <Attribute name="hrEduPersonPersistentID" nameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:basic" id="SHIB-NETID"/> + + <Attribute name="urn:oid:0.9.2342.19200300.100.1.3" id="SHIB-MAIL"/> + <Attribute name="urn:mace:dir:attribute-def:mail" id="SHIB-MAIL"/> + <Attribute name="mail" nameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:basic" id="SHIB-MAIL"/> + + <Attribute name="urn:oid:2.5.4.42" id="SHIB-GIVENNAME"/> + <Attribute name="urn:mace:dir:attribute-def:givenName" id="SHIB-GIVENNAME"/> + <Attribute name="givenName" nameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:basic" id="SHIB-GIVENNAME"/> + + <Attribute name="urn:oid:2.5.4.4" id="SHIB-SURNAME"/> + <Attribute name="urn:mace:dir:attribute-def:sn" id="SHIB-SURNAME"/> + <Attribute name="sn" nameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:basic" id="SHIB-SURNAME"/> + </Attributes>

shibboleth2.xml

This is main configuration file for shibd. First we need to add OutOfProcess and InProcess to include useful information in shibd.log which are very useful.

diff --git a/shibboleth/shibboleth2.xml b/shibboleth/shibboleth2.xml index ddfb98a..7b55987 100644 --- a/shibboleth/shibboleth2.xml +++ b/shibboleth/shibboleth2.xml @@ -2,15 +2,46 @@ xmlns:conf="urn:mace:shibboleth:3.0:native:sp:config" clockSkew="180"> - <OutOfProcess tranLogFormat="%u|%s|%IDP|%i|%ac|%t|%attr|%n|%b|%E|%S|%SS|%L|%UA|%a" /> +  + <OutOfProcess logger="shibd.logger" tranLogFormat="%u|%s|%IDP|%i|%ac|%t|%attr|%n|%b|%E|%S|%SS|%L|%UA|%a"> +  + </OutOfProcess> + +  + <InProcess logger="native.logger"> + <ISAPI normalizeRequest="true" safeHeaderNames="true"> +  + <Site id="1" name="sp.example.org"/> +  +  + </ISAPI> + </InProcess> +   - <ApplicationDefaults entityID="https://sp.example.org/shibboleth" + <ApplicationDefaults entityID="https://repository.clarin.hr/Shibboleth.sso/Metadata" REMOTE_USER="eppn subject-id pairwise-id persistent-id" cipherSuites="DEFAULT:!EXP:!LOW:!aNULL:!eNULL:!DES:!IDEA:!SEED:!RC4:!3DES:!kRSA:!SSLv2:!SSLv3:!TLSv1:!TLSv1.1"> @@ -31,8 +62,8 @@ entityID property and adjust discoveryURL to point to discovery service. You can also override entityID on /Login query string, or in RequestMap/htaccess. --> - <SSO entityID="https://idp.example.org/idp/shibboleth" - discoveryProtocol="SAMLDS" discoveryURL="https://ds.example.org/DS/WAYF"> + <SSO + discoveryProtocol="SAMLDS" discoveryURL="https://discovery.clarin.eu"> SAML2 </SSO> @@ -68,6 +99,10 @@  + <MetadataProvider type="XML" url="https://login.aaiedu.hr/shib/saml2/idp/metadata.php" backingFilePath="aaieduhr-metadata.xml" maxRefreshDelay="3600" />  <!--

certificates

To make upstream identity provider connect to us, we need valid certificate so we need /etc//shibboleth/sp-encrypt-cert.pem and /etc/shibboleth/sp-encrypt-key.pem.
Since we are using Let's encrypt for certificates, I'm using shell script to move them over and change permissions so shibd will accept them.

dpavlin@repository:/etc/shibboleth$ cat update-certs.sh
#!/bin/sh -xe

cp -v /etc/letsencrypt/live/repository.clarin.hr/privkey.pem sp-encrypt-key.pem
cp -v /etc/letsencrypt/live/repository.clarin.hr/cert.pem sp-encrypt-cert.pem
chown -v _shibd:_shibd sp-*.pem
/etc/init.d/shibd restart

dspace configuration

We are using upstream dspace docker which includes local.cfg from outside using docker bind, but also re-defines shibboleth headers so we need to restore them to default names defined before in /etc/shibboleth/attribute-map.xml.

diff --git a/docker/local.cfg b/docker/local.cfg index 168ab0dd42..6f71be32cf 100644 --- a/docker/local.cfg +++ b/docker/local.cfg @@ -14,3 +14,22 @@ # test with: /dspace/bin/dspace dsprop -p rest.cors.allowed-origins handle.prefix = 20.500.14615 + +shibboleth.discofeed.url = https://repository.clarin.hr/Shibboleth.sso/DiscoFeed + +plugin.sequence.org.dspace.authenticate.AuthenticationMethod = org.dspace.authenticate.PasswordAuthentication +plugin.sequence.org.dspace.authenticate.AuthenticationMethod = org.dspace.authenticate.ShibAuthentication + +# in sync with definitions from /etc/shibboleth/attribute-map.xml +authentication-shibboleth.netid-header = SHIB-NETID +authentication-shibboleth.email-header = SHIB-MAIL +authentication-shibboleth.firstname-header = SHIB-GIVENNAME +authentication-shibboleth.lastname-header = SHIB-SURNAME +# Should we allow new users to be registered automatically? +authentication-shibboleth.autoregister = true

example of working login

==> /var/log/shibboleth/shibd.log <== 2024-11-18 17:18:40 INFO XMLTooling.StorageService : purged 2 expired record(s) from storage ==> /var/log/shibboleth/transaction.log <== 2024-11-18 17:27:42|Shibboleth-TRANSACTION.AuthnRequest|||https://login.aaiedu.hr/shib/saml2/idp/metadata.php||||||urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST|||||| ==> /var/log/shibboleth/shibd.log <== 2024-11-18 17:27:42 INFO Shibboleth.SessionCache [1] [default]: new session created: ID (_e37d7f0b8ea3ff4d718b3e2c68d81e45) IdP (https://login.aaiedu.hr/shib/saml2/idp/metadata.php) Protocol(urn:oasis:names:tc:SAML:2.0:protocol) Address (141.138.31.16) ==> /var/log/shibboleth/transaction.log <== 2024-11-18 17:27:42|Shibboleth-TRANSACTION.Login|https://login.aaiedu.hr/shib/saml2/idp/metadata.php!https://repository.clarin.hr/Shibboleth.sso/Metadata!303a3b0f72c5e29bcbdf35cab3826e62|_e37d7f0b8ea3ff4d718b3e2c68d81e45|https://login.aaiedu.hr/shib/saml2/idp/metadata.php|_3b8a8dc87db05b5e6abf8aaf9d5c67e6ebc62a2eed|urn:oasis:names:tc:SAML:2.0:ac:classes:PasswordProtectedTransport|2024-11-18T17:01:57|SHIB-GIVENNAME(1),SHIB-MAIL(2),SHIB-NETID(1),SHIB-SURNAME(1),persistent-id(1)|303a3b0f72c5e29bcbdf35cab3826e62|urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST||urn:oasis:names:tc:SAML:2.0:status:Success|||Mozilla/5.0 (X11; Linux x86_64; rv:134.0) Gecko/20100101 Firefox/134.0|141.138.31.16

boken urf-8 in shibboleth user first name or surname

You might think that now everything works, but dspace will try to re-encode utf-8 characters received from upstream shibboleth using iso-8859-1 breaking accented characters in process. It has configuration for it, but by default it's false.

root@4880a3097115:/usr/local/tomcat/bin# /dspace/bin/dspace dsprop -p authentication-shibboleth.reconvert.attributes false

So we need to modify dspace-angular/docker/local.cfg and turn it on:

# incomming utf-8 from shibboleth AAI authentication-shibboleth.reconvert.attributes=true

WordPress comment spam - what to do about it and how

Let's assume that you inherited WordPress installation (or three) with tens of instances (or hundreds in this case) which are generating spam using comments. I will try to describe problem here and suggest solution which doesn't require clicking in WordPress but instead using wp cli which is faster and easier especially if you don't have administrative account on all those WordPress instances. Interested? Read on.

WordPress comment spam

If you try googling around how to prevent WordPress comment spam, you will soon arrive at two solutions:

changing default_comment_status to closed which will apply to all new posts
changing comment_status on all existing posts to close

However, this is not full solution, since media in WordPress can also have comments enabled, and those two steps above won't solve spam from media. There are plugins to disable media comments, but since we have many WordPress instances I wanted to find solution which doesn't require modifying each of them. And there is simple solution using close_comments_for_old_posts option which will basically do same thing after close_comments_days_old days (which by default is 14).

So, in summary, all this can easily be done using following commands in wp cli:

wp post list --post-status=publish --post_type=post --comment_status=open --format=ids \
        | xargs -d ' ' -I % wp post update % --comment_status=closed

wp option update default_comment_status closed

wp option update close_comments_for_old_posts 1

If wp cli doesn't work for you (for example if your WordPress instance is so old that wp cli is returning errors for some plugins instead of working) you can achieve same thing using SQL (this assumes that wp db query is working, but if it doesn't you can always connect using mysql and login and password from wp-config.php):

cat << __SQL__ | wp db query
update wp_posts set comment_status='closed' where comment_status != 'closed' ;
update wp_options set option_value = 'closed' where option_name = 'default_comment_status' and option_value != 'closed' ;
update wp_options set option_value = 1 where option_name = 'close_comments_for_old_posts' and option_value != 1
__SQL__

This is also faster option, because all SQL SQL queries are invoked using single wp db query call (and this since php instance startup which can time some time).

Cleaning up held or spam comments

After you disabled new spam in comments, you will be left with some amount of comments which are marked as spam or left in held status if your WordPress admins didn't do anything about them. To cleanup database, you can use following to delete spam or held comments:

wp comment delete $(wp comment list --status=spam --format=ids) --force

wp comment delete $(wp comment list --status=hold --format=ids) --force

Disabling contact form spam

All spam is not result of comments, some of it might come through contact form. To disable those, you can disable comment plugin which will leave ugly markup on page without it enabled, but spams will stop.

# see which contact plugins are active
wp plugin list | grep contact
contact-form-7  active  none    5.7.5.1
contact-form-7-multilingual     active  none    1.2.1

# disable them
wp plugin deactivate contact-form-7

freeradius testing and logging

If you are put in front of working radius server which you want to upgrade, but this is your first encounter with radius, following notes might be useful to get you started.

Goal is to to upgrade system and test to see if everything still works after upgrade.

radtest

First way to test radius is radtest which comes with freeradius and enables you to verify if login/password combination results in successful auth.

You have to ensure that you have 127.0.0.1 client in our case in /etc/freeradius/3.0/clients-local.conf file:

client 127.0.0.1 {
    ipv4addr    = 127.0.0.1
    secret      = testing123
    shortname   = test-localhost
}

Restart freeradius and test

# systemctl restart freeradius


# radtest username@example.com PASSword 127.0.0.1 0 testing123

Sent Access-Request Id 182 from 0.0.0.0:45618 to 127.0.0.1:1812 length 86
    User-Name = "username@example.com"
    User-Password = "PASSword"
    NAS-IP-Address = 193.198.212.8
    NAS-Port = 0
    Message-Authenticator = 0x00
    Cleartext-Password = "PASSword"
Received Access-Accept Id 182 from 127.0.0.1:1812 to 127.0.0.1:45618 length 115
    Connect-Info = "NONE"
    Configuration-Token = "djelatnik"
    Callback-Number = "username@example.com"
    Chargeable-User-Identity = 0x38343431636162353262323566356663643035613036373765343630333837383135653766376434
    User-Name = "username@example.com"

# tail /var/log/freeradius/radius.log
Tue Dec 27 19:41:15 2022 : Info: rlm_ldap (ldap-aai): Opening additional connection (11), 1 of 31 pending slots used
Tue Dec 27 19:41:15 2022 : Auth: (9) Login OK: [user@example.com] (from client test-localhost port 0)

This will also test connection to LDAP in this case.

radsniff -x

To get dump of radius traffic on production server to stdout, use radsniff -x.

This is useful, but won't get you encrypted parts of EAP.

freeradius logging

To see all protocol decode from freeradius, you can run it with -X flag in terminal which will run it in foreground with debug output.

# freeradius -X

If you have ability to run isolated freeradius for testing, this is easiest way to see all configuration parsed (and warnings!) and decoded EAP traffic.

generating more verbose log file

Adding -x to /etc/default/freeradius or to radius command-line will generate debug log in log file. Be mindful about disk space usage for additional logging! But to see enough debugging in logs to see which EAP type is unsupported like:

dpavlin@deenes:~/radius-tools$ grep 'unsupported EAP type' /var/log/freeradius/radius.log
(27) eap-aai: Peer NAK'd asking for unsupported EAP type PEAP (25), skipping...
(41) eap-aai: Peer NAK'd asking for unsupported EAP type PEAP (25), skipping...
(82) eap-aai: Peer NAK'd asking for unsupported EAP type PEAP (25), skipping...
(129) eap-aai: Peer NAK'd asking for unsupported EAP type PEAP (25), skipping...
(142) eap-aai: Peer NAK'd asking for unsupported EAP type PEAP (25), skipping...

you will need to use -xx (two times x) to get enough debugging log. Again, monitor disk usage carefully.

EAP radius testing using eapol_test from wpa_supplicant

To test EAP we need to build eapol_test tool from wpa_supplicant.

wget http://w1.fi/releases/wpa_supplicant-2.10.tar.gz

cd wpa_supplicant-/wpa_supplicant
$ cp defconfig .config
$ vi .config

CONFIG_EAPOL_TEST=y

# install development libraries needed
apt install libssl-dev libnl-3-dev libnl-genl-3-dev libnl-route-3-dev

make eapol_test

EAP/TTLS

Now ne need configuration file for wpa_supplicant which tests EAP:

ctrl_interface=/var/run/wpa_supplicant
ap_scan=1

network={
    ssid="eduroam"
    proto=WPA2
    key_mgmt=WPA-EAP
    pairwise=CCMP
    group=CCMP
    eap=TTLS
    anonymous_identity="anonymous@example.com"
    phase2="auth=PAP"
    identity="username@example.com"
    password="PASSword"
}

Now we can test against our radius server (with optional certificate test):

# ./wpa_supplicant-2.10/wpa_supplicant/eapol_test -c ffzg.conf -s testing123

and specifying your custom CA cert:

# ./wpa_supplicant-2.10/wpa_supplicant/eapol_test -c ffzg.conf -s testing123 -o /etc/freeradius/3.0/certs/fRcerts/server-cert.pem

This will generate a lot of output, but in radius log you should see

Tue Dec 27 20:00:33 2022 : Auth: (9)   Login OK: [username@example.com] (from client test-localhost port 0 cli 02-00-00-00-00-01 via TLS tunnel)
Tue Dec 27 20:00:33 2022 : Auth: (9) Login OK: [username@example.com] (from client test-localhost port 0 cli 02-00-00-00-00-01)

GTC

This seems like a part of tibial knowledge (passed to me by another sysadmin), but to make GTC work, change of default_eap_type to gtc under ttls and add gtc section:

        ttls {
                # ... rest of config...
                default_eap_type = gtc
                # ... rest of config...
        }

        gtc {
                challenge = "Password: "
                auth_type = LDAP
        }

and changing wpa-supplicant configuration to:

CLONE dupli deenes:/home/dpavlin# cat eduroam-ttls-gtc.conf
ctrl_interface=/var/run/wpa_supplicant
ap_scan=1

network={
        ssid="eduroam"
        proto=WPA2
        key_mgmt=WPA-EAP
        pairwise=CCMP
        group=CCMP
        eap=TTLS
        anonymous_identity="anonymous@example.com"
        phase2="autheap=GTC"
        identity="username@example.com"
        password="PASSword"
}

PEAP

To make PEAP GTC work, I needed to add:

diff --git a/freeradius/3.0/mods-available/eap-aai b/freeradius/3.0/mods-available/eap-aai
index 245b7eb..6b7cefb 100644
--- a/freeradius/3.0/mods-available/eap-aai
+++ b/freeradius/3.0/mods-available/eap-aai
@@ -73,5 +73,11 @@ eap eap-aai {
                auth_type = LDAP
        }

+       # XXX 2023-01-06 dpavlin - peap
+       peap {
+               tls = tls-common
+               default_eap_type = gtc
+               virtual_server = "default"
+       }

 }

which then can be tested with:

network={
        ssid="wired"
        key_mgmt=IEEE8021X
        eap=PEAP
        anonymous_identity="anonymous@example.com"
        identity="username@example.com"
        password="PASSword"
}

Local domains and caching bind server without Internet connection

What do do when you have bind as caching resolver which forwards to your DNS servers which do recursive resolving and host primary and secondary of your local domains and upstream link goes down?

To my surprise, caching server can't resolve your local domains although both primary and secondary of those domains are still available on your network and can resolve your domains without problem (when queried directly).

That's because caching server tries to do recursive resolving using root servers which aren't available if your upstream link is down, so even your local domains aren't available to clients using caching server.

Solution is simple if you know what it is. Simply add your local zones on caching server with type forward:

zone "ffzg.hr" {
    type forward;
    forwarders {
        193.198.212.8;
        193.198.213.8;
    };
};

zone "ffzg.unizg.hr" {
    type forward;
    forwarders {
        193.198.212.8;
        193.198.213.8;
    };
};

This will work, since queries for those zones are no longer recursive queries, so they don't need root servers which aren't available without upstream link.

dovecot maildir on compressed zfs pool

This is a story about our mail server which is coming close to it's disk space capacity:

root@mudrac:/home/prof/dpavlin# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        20G  7.7G   11G  42% /
/dev/vdb        4.0T  3.9T   74G  99% /home
/dev/vdc        591G  502G   89G  85% /home/stud

You might say that it's easy to resize disk and provide more storage, but unfortunately it's not so easy. We are using ganeti for our virtualization platform, and current version of ganeti has limit of 4T for single drbd disk.

This can be solved by increasing third (vdc) disk and moving some users to it, but this is not ideal. Another possibility is to use dovecot's zlib plugin to compress mails. However, since our Maildir doesn't have required S=12345 as part of filename to describe size of mail, this solution also wasn't applicable to us.

Installing lvm would allow us to use more than one disk to provide additional storage, but since ganeti already uses lvm to provide virtual disks to instance this also isn't ideal.

OpenZFS comes to rescue

Another solution is to use OpenZFS to provide multiple disks as single filesystem storage, and at the same time provide disk compression. Let's create a pool:

zpool create -o ashift=9 mudrac /dev/vdb
zfs create mudrac/mudrac
zfs set compression=zstd-6 mudrac
zfs set atime=off mudrac

We are using ashift of 9 instead of 12 since it uses 512 bytes blocks on storage (which is supported by our SSD storage) that saves quite a bit of space:

root@t1:~# df | grep mudrac
Filesystem      1K-blocks       Used Available Use% Mounted on
mudrac/mudrac  3104245632 3062591616  41654016  99% /mudrac/mudrac # ashift=12
m2/mudrac      3104303872 2917941376 186362496  94% /m2/mudrac     # ashift=9

This is saving of 137Gb just by choosing smaller ashift.

Most of our e-mail are messages kept on server, but rarely accessed. Because of that I opted to use zstd-6 (instead of default zstd-3) to compress it as much as possible. But, to be sure it's right choice, I also tested zstd-12 and zstd-19 and results are available below:

LEVEL	USED	COMP	H:S
zstd-6	2987971933184	60%	11:2400
zstd-12	2980591115776	59%	15:600
zstd-19	2972514841600	59%	52:600

Compression levels higher than 6 seem to need at least 6 cores to compress data, so zstd-6 seemed like best performance/space tradeoff, especially if we take into account additional time needed for compression to finish.

bullseye kernel for zfs and systemd-nspawn

To have zfs, we need recent kernel. Instead of upgrading whole server to bullseye at this moment, I decided to boot bullseye with zfs and start unmodified installation using systemd-nspawn. This is easy using following command line:

systemd-nspawn --directory /mudrac/mudrac/ --boot --machine mudrac --network-interface=eth1010 --hostname mudrac

but it's not ideal for automatic start of machine, so better solution is to use machinectl and systemd service for this. Converting this command-line into nspawn is non-trivial, but after reading man systemd.nspawn configuration needed is:

root@t1:~# cat /etc/systemd/nspawn/mudrac.nspawn
[Exec]
Boot=on
#WorkingDirectory=/mudrac/mudrac
# ln -s /mudrac/mudrac /var/lib/machines/
# don't chown files
PrivateUsers=false

[Network]
Interface=eth1010

Please note that we are not using WorkingDirectory (which would copy files from /var/lib/machines/name) but instead just created symlink to zfs filesystem in /var/lib/machines/.

To enable and start container on boot, we can use:

systemctl enable systemd-nspawn@mudrac
systemctl start systemd-nspawn@mudrac

Keep network device linked to mac address

Predictable network device names which bullseye uses should provide stable network device names. This seems like clean solution, but in testing I figured out that adding additional disks will change name of network devices. Previously Debian used udev to provide mapping between network interface name and device mac using /etc/udev/rules.d/70-persistent-net.rules. Since this is no longer the case, solution is to define similar mapping using systemd network like this:

root@t1:~# cat /etc/systemd/network/11-eth1010.link
[Match]
MACAddress=aa:00:00:39:90:0f

[Link]
Name=eth1010

Increasing disk space

When we do run out of disk space again, we could add new disk and add it to zfs pool using:

root@t2:~# zpool set autoexpand=on mudrac
root@t2:~# zpool add mudrac /dev/vdc

Thanks to autoexpand=on above, this will automatically make new space available. However, if we increase existing disk up to 4T new space isn't visible immediately since zfs has partition table on disk, so we need to extend device to use all space available using:

root@t2:~# zpool online -e mudrac vdb

zfs snapshots for backup

Now that we have zfs under our mail server, it's logical to also use zfs snapshots to provide nice, low overhead incremental backup. It's as easy as:

zfs snap mudrac/mudrac@$( date +%Y-%m-%d )

in cron.daliy and than shipping snapshots to backup machine. I did look into existing zfs snapshot solutions, but they all seemed a little bit too complicated for my use-case, so I wrote zfs-snap-to-dr.pl which copies snapshots to backup site.

To keep just and two last snapshots on mail server simple shell snippet is enough:

zfs list -r -t snapshot -o name -H mudrac/mudrac > /dev/shm/zfs.all
tail -2 /dev/shm/zfs.all > /dev/shm/zfs.tail-2
grep -v -f /dev/shm/zfs.tail-2 /dev/shm/zfs.all | xargs -i zfs destroy {}

Using shell to create and expire snapshots and simpler script to just transfer snapshots seems to me like better and more flexible solution than implementing it all in single perl script. In a sense, it's the unix way of small tools which do one thing well. Only feature which zfs-snap-to-dr.pl has aside from snapshot transfer is ability to keep just configurable number of snapshots on destination which enables it to keep disk usage under check (and re-users already collected data about snapshots).

This was interesting journey. In future, we will migrate mail server to bullseye and remove systemd-nspawn (it feels like we are twisting it's hand using it like this). But it does work, and is simple solution which will come handy in future.

Track your configuration using git

I have a confession to make: etckeeper got me spoiled. I really like ability to track changes in git and have it documented in git log. However, this time I was working on already installed machine which didn't have much files in /etc for etckeeper, but I wanted to have peace of mind with configuration in git.

This got me thinking: I could create git in root (/) of file-system and than track any file using it. Since this is three servers I could also use other two nodes to make a backup of configuration by pushing to them.

To make this working first I need to do init git repository and create branch with same name as short version of hostname (this will allow us to push and pull with unique branch name on each machine):

# cd /
# git init
# git checkout -b $( hostname -s )

With this done, all I have to do now is add and commit a file that I want to change (to preserve original version), make changes and commit it after change. To make first step easier, I created script which allows me to do git ac /path/to/file that will add file to git and commit original version in just one command (ac = add+commit).

# cat /usr/local/bin/git-ac
#!/bin/sh

git add $*
git commit -m $1 $*

With this in place, I now have nice log of one server. Now it's time to repeat it on each machine and use git remote add host1 host1:/.git to add other hosts.

Since I have some commits in branch with short hostname, it's also right moment to issue git branch -d master to remove master branch which we don't use (and will clutter out output later).

We can fetch branches from other servers manually, but since we already have that information in git remote I wrote another quick script:

# cat /usr/local/bin/git-f
git remote | xargs -i git fetch {}

With this I can issue just git f to fetch all branches on all hosts. If I want to push changes to other nodes, I can do git p which is similar script:

# cat /usr/local/bin/git-p
# disable push with git remote set-url --push pg-edu no_push

git remote | xargs -i git push {} $( hostname -s )

There is also a note how to disable push to some remote (if you don't want to have full history there, but want to pull from it).

With this in place, you will get nice log of changes in git, and since every host hast branch of all other hosts, you can even use git cherry-pick to get same change on multiple hosts. Last useful hint is to use git branch -va which will show all branches together with sha of last commit which can be used to cherry pick last commit. If you need older commits, you can always issue git log on remote branch and pick up commit that you need.

Last step is to add cron job in cron.daily to commit changes daily which you forgot to commit:

# cat /etc/cron.daily/cron-commit
#!/bin/sh

cd /
git commit -m $( date +%Y-%m-%dT%H%M%S ) -a

With everything documented here, you have easy to use git in which you can track changes of any file on your file-system. There is one additional note: if file that you want to track is on nfs mount, you will need to add and commit it from outside of nfs mount (specifying full path to file on nfs) because if you are inside nfs mount git will complain that there is no git repository there.

mysql database with latin1 charset and utf8 data

I know that it's 2021, but we are still having problems with encoding in mysql (MariaDB in this cane, but problem is smilar). This time, it's application which I inherited which saves utf-8 data into database which is declared as latin1.

How can you check if this is problem with your database too?

MariaDB [ompdb]> show create database ompdb ;
+----------+------------------------------------------------------------------+
| Database | Create Database                                                  |
+----------+------------------------------------------------------------------+
| ompdb    | CREATE DATABASE `ompdb` /*!40100 DEFAULT CHARACTER SET latin1 */ |
+----------+------------------------------------------------------------------+

Alternative way is to invoke mysqldump ompdb and example file generated. Why is this a problem? If we try SQL query on one of tables:

MariaDB [ompdb]> select * from submission_search_keyword_list where keyword_text like 'al%ir' ;
+------------+--------------+
| keyword_id | keyword_text |
+------------+--------------+
|       3657 | alzir        |
|       1427 | alÅ¾ir       |
+------------+--------------+

You can clearly see double-encoded utf8 which should be alžir. This is because our client is connecting using utf8 charset, getting utf8 data in binary form so we see double-encoding. So we can try to conntect using latin1 with:

root@omp-clone:/home/dpavlin# mysql --default-character-set=latin1 ompdb
MariaDB [ompdb]> select * from submission_search_keyword_list where keyword_text like 'al%ir' ;
+------------+--------------+
| keyword_id | keyword_text |
+------------+--------------+
|       3657 | alzir        |
|       1427 | alžir       |
+------------+--------------+

Note that everything is still not well, because grid after our utf8 data is not aligned well.

Googling around, you might find that possible solution is to add --default-character-set=latin1 to mysqldump, edit all occurrences of latin1 to utf8 (utf8mb4 is better choice) and reload database, and problem is solved, right?

If we try to do that, we will get following error:

ERROR 1062 (23000) at line 1055 in file: '1.sql': Duplicate entry 'alžir' for key 'submission_search_keyword_text'

Why is this? MySQL uses collation setting to remove accents from data, so it treats alzir and alžir as same string. Since we have both of them in our data, this is not good enough. Also, editing database manually always makes me nervous, so we will using following to get database dump without declaration of encoding (due to --skip-opt option), but using latin1 for dumping data:

mysqldump ompdb --skip-set-charset --default-character-set=latin1 --skip-opt > /tmp/1.sql

Next, we need to create database with collation which preserves everything (utf8mb4_bin) using:

CREATE DATABASE omp2 CHARACTER SET = 'utf8mb4' COLLATE 'utf8mb4_bin' ;

Finally we should be able to reload created dump without errors:

mysql omp2 < /tmp/1.sql

One additional benefit of using --skip-opt for mysqldump is that every insert is split into individual line. So if you want to have correct collation and skip data which is invalid (which might be possible depending on where data is) you can use same mysqldump file and add -f flag when reloading dump like mysql -f omp2 < /tmp/1.sql which will report data that have errors, but insert everything else into database.

request tracker where ldap users have multiple mail addresses

We have been using request tracker for years but recently changed how many e-mail addresses we keep in LDAP mail attribute. Up until now, we stored just our local e-mail addresses there, but lately we also added external addresses that our users have.

This created a problem when users try to send e-mail from external address to our rt. To test this, I have account usertest which has dpavlin@example.com as first mail in LDAP and dpavlin@m.example.com as second one and I'm sending e-mail from dpavlin@m.example.com like this:

swaks --to sysadmin@rt.example.com --from dpavlin@m.example.com

Result is following log which seems very verbose, but is also useful in understanding what is going wrong:

[14188] [Fri Apr 16 07:57:26 2021] [debug]: Going to create user with address 'dpavlin@m.example.com' (/usr/local/share/request-tracker4/lib/RT/Interface/Email/Auth/MailFrom.pm:100) [14188] [Fri Apr 16 07:57:26 2021] [debug]: RT::Authen::ExternalAuth::CanonicalizeUserInfo called by RT::Authen::ExternalAuth /usr/local/share/request-tracker4/plugins/RT-Authen-ExternalAuth/lib/RT/Authen/ExternalAuth.pm 886 with: Comments: Autocreated on ticket submission, Disabled: , EmailAddress: dpavlin@m.example.com, Name: dpavlin@m.example.com, Password: , Privileged: , RealName: (/usr/local/share/request-tracker4/plugins/RT-Authen-ExternalAuth/lib/RT/Authen/ExternalAuth.pm:793) [14188] [Fri Apr 16 07:57:26 2021] [debug]: Attempting to get user info using this external service: FFZG_LDAP (/usr/local/share/request-tracker4/plugins/RT-Authen-ExternalAuth/lib/RT/Authen/ExternalAuth.pm:801) [14188] [Fri Apr 16 07:57:26 2021] [debug]: Attempting to use this canonicalization key: Name (/usr/local/share/request-tracker4/plugins/RT-Authen-ExternalAuth/lib/RT/Authen/ExternalAuth.pm:810) [14188] [Fri Apr 16 07:57:26 2021] [debug]: LDAP Search === Base: dc=ffzg,dc=hr == Filter: (&(objectClass=*)(uid=dpavlin@m.example.com)) == Attrs: co,uid,postalCode,physicalDeliveryOfficeName,uid,streetAddress,telephoneNumber,hrEduPersonUniqueID,cn,l,st,mail (/usr/local/share/request-tracker4/plugins/RT-Authen-ExternalAuth/lib/RT/Authen/ExternalAuth/LDAP.pm:358) [14188] [Fri Apr 16 07:57:26 2021] [debug]: Attempting to use this canonicalization key: EmailAddress (/usr/local/share/request-tracker4/plugins/RT-Authen-ExternalAuth/lib/RT/Authen/ExternalAuth.pm:810) [14188] [Fri Apr 16 07:57:26 2021] [debug]: LDAP Search === Base: dc=ffzg,dc=hr == Filter: (&(objectClass=*)(mail=dpavlin@m.example.com)) == Attrs: co,uid,postalCode,physicalDeliveryOfficeName,uid,streetAddress,telephoneNumber,hrEduPersonUniqueID,cn,l,st,mail (/usr/local/share/request-tracker4/plugins/RT-Authen-ExternalAuth/lib/RT/Authen/ExternalAuth/LDAP.pm:358) [14188] [Fri Apr 16 07:57:26 2021] [info]: RT::Authen::ExternalAuth::CanonicalizeUserInfo returning Address1: , City: Zagreb, Comments: Autocreated on ticket submission, Country: , Disabled: , EmailAddress: dpavlin@example.com, ExternalAuthId: usertest@example.com, Gecos: usertest, Name: usertest, Organization: , Password: , Privileged: , RealName: TestiÄiÄ ProbiÅ¡iÄ ÄÅ¾, State: , WorkPhone: 014092209, Zip: (/usr/local/share/request-tracker4/plugins/RT-Authen-ExternalAuth/lib/RT/Authen/ExternalAuth.pm:869) [14188] [Fri Apr 16 07:57:26 2021] [crit]: User could not be created: User creation failed in mailgateway: Name in use (/usr/local/share/request-tracker4/lib/RT/Interface/Email.pm:243) [14188] [Fri Apr 16 07:57:26 2021] [warning]: Couldn't load user 'dpavlin@m.example.com'.giving up (/usr/local/share/request-tracker4/lib/RT/Interface/Email.pm:876) [14188] [Fri Apr 16 07:57:26 2021] [crit]: User could not be loaded: User 'dpavlin@m.example.com' could not be loaded in the mail gateway (/usr/local/share/request-tracker4/lib/RT/Interface/Email.pm:243) [14188] [Fri Apr 16 07:57:26 2021] [error]: Could not load a valid user: RT could not load a valid user, and RT's configuration does not allow for the creation of a new user for this email (dpavlin@m.example.com). You might need to grant 'Everyone' the right 'CreateTicket' for the queue SysAdmin. (/usr/local/share/request-tracker4/lib/RT/Interface/Email.pm:243)

I'm aware that lines are long, and full of data but they describe problem quite well:

RT tries to find user with e-mail address dpavlin@m.example.com (which doesn't exist since RT uses just first e-mail from LDAP which is dpavlin@example.com)
then it tries to create new user with dpavlin@m.example.com, but runs another search over ldap to make sure it won't create duplicate user
this will find user in ldap due to second email adress and gives wrong error message.

As log file is very detailed and include path to files used and line numbers solution was simple additional check for this exact case:

--- /usr/share/request-tracker4/plugins/RT-Authen-ExternalAuth/lib/RT/Authen/ExternalAuth/LDAP.pm.orig  2017-04-05 14:57:22.932000146 +0200
+++ /usr/share/request-tracker4/plugins/RT-Authen-ExternalAuth/lib/RT/Authen/ExternalAuth/LDAP.pm       2021-04-16 15:49:34.800001819 +0200
@@ -429,6 +429,12 @@
                                 $ldap_msg->code);
     }

+    # FIXME -- dpavlin 2021-04-16 -- check if e-mail from ldap is same as incomming one
+    if ( $key eq 'mail' && $value ne $params{EmailAddress}) {
+       $RT::Logger->debug( "LDAP mail check return not found key = $key value = $value $params{EmailAddress}");
+       $found = 0;
+    }
+
     undef $ldap;
     undef $ldap_msg;

If e-mail address we found in LDAP is not the same one we did lookup on in CanonicalizeUserInfo we just ignore it.

I think that nicely shows power of good logs and open source software written in scripting language which you can modify in the place for your (slightly broken) configuration.