etckeeper, bind jnl files and git-pack memory problems

For last few years, one of first tools which we install on each new server is etckeeper. It saved us couple of times, and provides nice documentation about changes on the system.

However, git can take a lot of space if you have huge files which change frequently (at least daily since etckeeper has daily cron job to commit changes done that day). In our case, we have bind which stores jnl files in /etc/bind which results in about 500 Kb change each day for 11 zones we have defined.

You might say that it doesn't seem to be so bad, but in four months, we managed to increase size of repository from 300 Mb to 11 Gb. Yes, this is not mistake, it's 11000 Mb which is increase of 36 times! Solution for this is to use git gc which will in turn call git-pack to compress files. But this is where problems start -- git needs a lot of RAM to do gc. Since this machine has only 1 Gb of RAM, this is not enough to run git gc without running out of memory.

Last few times, I transferred git repository to other machine, run git gc there and than transfered it back (resulting in nice decrease from 11 Gb back to 300 Mb), however, this is not ideal solution. So, let's remove bind jnl files from etckeeper...

Let's start with our 11 Gb git repo, copy it to another machine which has 64 Gb or RAM needed for this operation.

root@dns01:/etc# du -ks .git
11304708        .git

root@dns01:~# rsync -ravP /etc/.git build.ffzg.hr:/srv/dns01/etc/
Now, we will re-create local files because we need to find out which jnl files are used so we can remove them from repo.
root@build:/srv/dns01/etc# git reset --hard

ls bind/*.jnl | xargs -i git filter-branch -f --index-filter 'git rm --cache --ignore-unmatch {}'

echo 'bind/*.jnl' >> .gitignore
git commit -m 'ignore ind jnl files' .gitignore
Now, finally we can shrink our 11 Gb repo!
root@build:/srv/dns01/etc# du -kcs .git
11427196        .git

root@build:/srv/dns01/etc# git gc
Counting objects: 38117, done.
Delta compression using up to 18 threads.
Compressing objects: 100% (27385/27385), done.
Writing objects: 100% (38117/38117), done.
Total 38117 (delta 27643), reused 12846 (delta 10285)
Removing duplicate objects: 100% (256/256), done.

root@build:/srv/dns01/etc# du -ks .git
414224  .git

# and now we can copy it back...

root@dns01:/etc# rsync -ravP build.ffzg.hr:/srv/dns01/etc/.git .
Just as side note, if you want to run git gc --aggressive on same repo, it won't finish with 60 Gb or RAM and 100 Gb of swap, which means that it needs more than 150 Gb of RAM.

So, if you are storing modestly sized files which change a lot, have in mind that you might need more RAM to run git gc (and get disk usage under control) than you might have.