Sync part of subversion repository

I had a particular problem at work: we have upstream subversion repository which we access over ssh tunnel (using svn protocol) which contains two branches in which we are interested and various other stuff we don't care about (and don't want to mirror).

On other hand, we also wanted to have local copy of all changes (preserving history) and local commit messages and SVN::Web interface.

In original idea, I also wanted to keep revision numbers as-is (so I can just checkout our local version and be done), but this wasn't possible. One solution that we examined is to use Pushmi and make local copy, but we didn't want all the other changes.

Other idea was to use svndumpfilter to sync only two branches we are interested in (it will create dummy commits for revision which are outside our branches), but since branches are result of copy from parts of the tree we don't want to sync, it didn't work either.

Did I mentioned that our svn repository can access upstream only through carefully crafted ssh tunnels? Mess, right?

So, in the end, solution was hybrid:

  • make local copy of two upstream branches using svk (loosing original order of commits, even if we are commiting into same svk mirror copy at our side)
  • install post-commit hook in upstream repository which will call (over https) svk sync at our side (I would probably use SMTP to trigger that, but our machine with svn repository doesn't accept outside e-mail)
  • install local post-commit hook to send e-mail notifications

Rest of this post are instructions on how to do this. Since I learned a thing of two doing this, I hope it might be also useful for others.

First create svn-pull.sh shell script which will run under user which has ssh keys to login to upstream firewall (1.2.3.4 in this example) and setup tunnels to upstream svn server (10.1.1.1):

#!/bin/sh
ssh -L 13690:10.1.1.1:3690 1.2.3.4 sleep 2 &
pid=$!
SVKROOT=/home/user/.svk svk sync -a
kill $pid

Now setup mirrors of branches we care about:

svk mirror svn://127.0.0.1:13690/project/carnet-foo /project/foo
svk mirror svn://127.0.0.1:13690/project/carnet-bar /project/bar

This is all nice, but we need to trigger it from www-user which is done with following in /etc/sudoers:

www-data ALL=(user) NOPASSWD:/home/user/svn-pull.sh

and add simple cgi script which will trigger sync operation:

#!/bin/sh
echo -e "Content-type: text/plain\n\r\n\r"
sudo -u user /home/user/svn-pull.sh

I used ScriptAlias in apache to make it visible at https://svn-ours.example.com/upstream-svn-update. No need on obsucate URL, since it's behind SSL for added points. IP address limit might also be a good idea:

  <Location /upstream-svn-update>                   
        Order allow,deny
        Allow from 1.2.3.4
  </Location>

Now install post-commit hook in upstream repository. We care only for files which have /carnet in path since branches which we are interested have that prefix:

svn log -v -r $REV file://$REPOS | grep ' /carnet' 2>/dev/null \
    && wget -q -O /dev/null https://svn-ours.example.com/upstream-svn-update

You will notice that there are no locking or any other tweaks, since all tools have those capabilities anyway, so we are really just using RPC via cgi over https in fact.

Nice and easy, once you know how to do it! It seems like a few bits of configuration all over the place, but I hope that it employs KISS - keep it simple and stupid at it's best.

Update: OK, now we have local repository (with different revisions), but svn switch --relocate doesn't work because those repositories are not same (makes sense, eh?)

Following steps are quick explanation now to copy .svn directories from new repository:

cd /srv/carnet-foo
# update repository to last upstream version
svn update
# delete old .svn directories
find . -name ".svn" -exec rm -Rf {} \;
# checkout new repository
cd /srv
svn co svn://svn-ours.example.com/carnet-foo carnet-foo.new
# copy new .svn files to old repository
cd carnet-foo.new
find . -wholename "*/.svn/*" | cpio -pvd ../carnet-foo/
# cleanup
cd /srv
rm -Rf carnet-foo.new
# following shouldn't return any differences
cd carnet-foo
svn diff