Results matching “subversion”

bak-git: tracking remote files in central git repository

If you are system administrator this will sound familiar: you have to quickly fix something, and you know that you should document it somewhere (or keep backup) but it's so much work. You could install one of existing source control management tools on each box, but they usually come with huge dependencies, and having all files in central location would be so useful to co-relate configuration changes. To add insult to injury, existing SCMs don't do good job in tracking just few files spread across file-system.

So, what would be perfect tool for keeping remote files in central git repository look like?

no dependency on non-standard tools on clients allowing easy deployment
track individual files and ignore rest
central repository, one directory per hostname

I tried to solve this problem several times, writing wrappers around subversion to handle sparse checkouts and installing subversion and ssh authentication all over the place. But, all this should be simpler... Like this:

add new client to track:
```
dpavlin@klin:~/klin/bak-git$ ./bak-git-server.pl ~/backup/ 10.60.0.92 --install brr
install on brr
# lot of output stripped
```
This will do several steps:
- create git repository in ~/backup/ if it doesn't exist already
- install root ssh authentication to brr using ssh-copy-id
- install bak shell helper which uses netcat to connect back to 10.60.0.92
- install rsync on client and use it as root over ssh to sync files

Now we can login into brr and start tracking our files:

dpavlin@brr:~$ bak add /etc/cron.d/tun0 
dpavlin@brr:~$ bak add /etc/network/interfaces
dpavlin@brr:~$ bak commit
dpavlin@brr:~$ bak log
commit df09dc5e19ef1d47311d701b4c63f0859b0b81c1
Author: Dobrica Pavlinusic 
Date:   Thu Feb 18 19:04:21 2010 +0100

    brr [commit] /home/dpavlin/

 create mode 100644 brr/etc/cron.d/tun0
 create mode 100644 brr/etc/network/interfaces

change some configuration and review changes

dpavlin@brr:~$ bak diff
diff --git a/brr/etc/network/interfaces b/brr/etc/network/interfaces
index 806c08e..c52c646 100644
--- a/brr/etc/network/interfaces
+++ b/brr/etc/network/interfaces
@@ -2,8 +2,6 @@
 # and how to activate them. For more information, see interfaces(5).
 
 # The loopback network interface
-auto lo
-iface lo inet loopback
 
 # The primary network interface
 #allow-hotplug eth0

Uups!! Where did loopback disappeared?

dpavlin@brr:~$ bak revert /etc/network/interfaces 
dpavlin@brr:~$ bak diff

If we are content with changes, we can also commit them:

dpavlin@brr:~$ bak commit /etc/network/interfaces optional note

As you guessed by now, it's very similar to git usage (expect revert which is from subversion) but with easy deployment on clients. It implements reduced subset of git commands:

bak add /path
bak commit [/path [message]]
bak diff
bak status
bak log
bak - push all local changes to server (without commit!)

If you need anything more complex, you can use git directly on ~/backup repository (even to commit changes from multiple hosts in one go).

Whole solution seems like ftp protocol, with data channel using ssh and rsync. File transfer should be encrypted (since we are trying to manage configuration files with sensitive information) and if you want to be really secure, just run server on 127.0.0.1 and tunnel port using RemoteForward 9001 localhost:9001 in .ssh/config.

Sack - sharding memory hash in perl

Main design goal is to have interactive environment to query perl hashes which are bigger than memory on single machine.

Implementation uses TCP sockets (over ssh if needed) between perl processes. This allows horizontal scalability both on multi-core machines as well as across the network to additional machines.

Reading data into hash is done using any perl module which returns perl hash and supports offset and limit to select just subset of data (this is required to create disjunctive shards). Parsing of source file is done on master node (called lorry) which then splits it to shards and send data to sack nodes.

Views are small perl snippets which are called for each record on each shard with $rec. Views create data in $out hash which is automatically merged on master node.

You can influence default shard merge by adding + (plus sign) in name of your key to indicate that key => value pairs below should have values summed when combining shards on master node.

If view operation generate huge amount of long field names, you might run out of memory on master node when merging results. Solution is to add # to name of key which will turn key names into integers which use less memory.

So, how does it look? Below is small video showing 121887 records spread over 18 cores on 9 machines running first few short views, and than largest one on this dataset.

If your browser doesn't have support for <video> tag, watch Sack video on YouTube or using ttyrec player written in JavaScript.

Source code for Sack is available in my subversion and this is currently second iteration which brings much simpler network protocol (based only on perl objects serialized directly to socket using Storable) and better support for starting and controlling cluster (which used to be shell script).

Update: Sack now has proper home page at Ohloh and even playlist on YouTube (which doesn't really like my Theora encoded videos and doesn't have rss feed natively).

Following video shows improvements in version 0.11 on 22 node cloud hopefully better than video above.

PXElator introduction

This weekend we where in Split on Ništa se neće dogoditi event and I did presetation about first three weeks of PXElator development which can be used as gentle introduction into this project. So, here we go...

Deployment of new machines
Disaster recovery documentation for me, two years later
Configure system like you normally would but track changes
Provide overview and monitoring

Introduction

PXElator is just a peace of puzzle which aims to replace system administration with nice declarative programs in perl. It's a experiment in replacing my work with reusable perl snippets.

It tries to solve following problems:

support deployment of new physical or virtual machines (ip, hostname, common configuration)
maintain documentation about changes on systems, good enough to be used for disaster recovery (or deployment of similar system)
configure systems in small chunks (virtual or containers) for better management and resource tracking using normal system administration tools (but track those changes)
provide overview and monitoring of network segment and services on it with alerting and trending

Deployment of new machines

What is really machine? For PXElator, it's MAC and IP address and some optional parameters (like hostname). It's stored on file-system, under conf/server.ip/machine.ip/hostname and can be tracked using source control if needed.

This is also shared state between all daemons implementing network protocols:

DHCP (with PXE support)
TFTP (to deliver initial kernel and initrd using pxelinux)
HTTP (to provide alternative way to fetch files and user interface)
DNS (we already have data)
syslog
AMT for remote management

Having all that protocols written in same language enables incredible flexibility in automatic configuration. I can issue command using installation which has only ping because I can have special DNS names which issue commands.

But, to get real power, we need to aggregate that data. I'm currently using CouchDB from http://couchdb.apache.org/ to store all audit data from all services into single database.

I wanted simple way to write ad-hoc queries without warring about data structure too much. At the end, I opted for audit role of data, and used 1 second granularity as key when storing data. Result of it is that 133 syslog messages from kernel right after boot you will create single document with 133 revisions instead of flooding your database.

It would be logical to plug RRDtool http://oss.oetiker.ch/rrdtool/ somewhere here to provide nice graphs here, but that is still on TODO list.

End user scenarios:

Take a new machine, plug it into network, boot it from network and configure for kiosk style deployment with Webconverger available at http://webconverger.com/. Kiosk should automatically turn on every morning at 7:30 and turn off at 20:30.
Boot virtual machine (with new ip and hostname) from backup snapshot for easy recovery or testing
Boot machine from network into fully configurable (writable) system for quick recovery or dedicated machine. This is implemented using NFS server with aufs read-write overlay on top of debootstrap base machine.

Disaster recovery documentation for me, two years later

I have been trying to write useful documentation snippets for years. My best effort so far is Sysadmin Cookbook at http://sysadmin-cookbook.rot13.org/ a set of semi-structured shell scripts which can be executed directly on machines.

This part isn't yet integrated into PXElator, but most of the recipe will become some kind of rule which you can enforce on some managed machine.

End user scenario:

Install that something also on this other machine

Configure system like you normally would but track changes

This is basically requirement to track configuration changes. Currently, this feature falls out of writable snapshot over base system which is read-only. Overlay data is all custom configuration that I did!

Tracking changes on existing machines will be implemented scp to copy file on server into hostname/path/to/local/file directory structure. This structure will be tracked using source control (probably git as opposed to subversion which PXElator source uses) and cron job will pull those files at some interval (daily, hourly) to create rsync+git equivalent of BackupPC http://backuppc.sourceforge.net for this setup.

It's interesting to take a look how it's different from Puppet and similar to cfengine3:

All data is kept in normal configuration files on system -- you don't need to learn new administration tools or somehow maintain two sources of configuration (in configuration management and on the system)
Introspect live system and just tries to apply corrections if needed which is similar to cfengine3 approach.

End user scenario:

Turn useful how-to into workable configuration without much effort

Provide overview and monitoring

This falls out from HTTP interface and from collecting of data into CouchDB. For now, PXElator tries to manage development environment for you, opening xterms (with screen inside for logging and easy scrollback) in different colors, and enable you to start Wireshark on active network interfaces for debugging.

Why I'm leaving git and going back to Subversion?

I'm using Subversion for most of my work as all of you well know by now. All this hype about git persuaded me to give it another try. I don't really have anything to gain doing this, since I'm using svk when I need distributed VCS, but somehow I though that git might be right solution to keep all my system configuration so I can debootstrap system checkout configuration and I'm ready to go.

I could use etckeeper to do some of this stuff, but I really didn't want integration with apt. I just wanted single (network connected and backed up) place. I already tried this with git on single machine with local repository, and it worked pretty well.

This time I tried to use git branches to track different machines. I really want a single repository, so I can merge common changes all around. However, today I got this:

root@syslog:/# git push
Counting objects: 16, done.
Compressing objects: 100% (9/9), done.
Writing objects: 100% (10/10), 1.16 KiB, done.
Total 10 (delta 0), reused 0 (delta 0)
To ssh://backup/srv/backup/
   66a2f9b..8f195f5  syslog -> syslog
 ! [rejected]        koha-dev -> koha-dev (non-fast forward)
 ! [rejected]        master -> master (non-fast forward)
error: failed to push some refs to 'ssh://backup/srv/backup/'

I have no idea why would two branches which have nothing to do with current one would disable distributed part of git. If you can't read git output, message above means that I wasn't able to commit my changes to central repository.

This is a huge show stopper for me. Half a day of googling didn't find answer for this particular git question. This makes my whole setup a big useless overhead.

This is all well and fun, but since this is second time that git ate my data I'm falling back to good old friend Subversion. At least, when in breaks, I have error messages which are somewhat useful, Subversion book which explains most operations with it (so I don't have to google for every little bit, like pushing single branch from one repository to another with git).

Don't get me wrong: git is c00l, we all know that, but it's just immature if you don't want to be git developer. If you think that I'm just old-timer which can't join all this new-age DVCS mumble-mumble, read why Google picked Mercurial instead of git as DVCS. Different story, but helpful to see that git isn't only solution to every problem

Now I just need to convert my existing git branches back into subversion. It seems that git-svn dcommit is answer, but how to really push four different git branches back into subversion I still don't know. I will probably just re-add all tracked files to clean Subversion and start all over again.

SQL2XLS: create Excel sheets from bunch of SQL files

Wouldn't it be nice to have a CGI script which would convert bunch of SQL queries into XLS file on-the-fly? And while we are at it, let's have multiple reports, each in it's own directory?

In a sense, it's simple REST API to SQL files on disk which produce Excel files. I first wrote something like this back in 2002, but until now, I didn't have subversion repository for it or announced it to the world.

Each file in current directory which ends in *.sql will be converted to Excel sheet. If you want to have specific order, you can prefix filenames with numbers which will be striped when creating sheet names.

Comments in sql files (lines beginning with --) will be placed in first line in bold.

To specify database on which SQL query is executed \c database syntax is supported.

You can also run script from command line, and it will produce sql_reports.xls file.

If run within directory, it will use files in it to produce file.

When called as CGI, directory name can be appended to name of script to produce report for any sub-directory within directory where sql2xls.cgi is installed.

INSTALLATION

Only required file is this script sql2xls.cgi

If your server is configured to execute .cgi files, you can drop this script anywhere, but you can also add something like

   ScriptAlias /xls-reports /srv/SQL2XLS/sql2xls.cgi

in Apache's virtual host configuration to get nice URLs

To configure default database, user, password and other settings create config.pl file in same directory in which sql2xls.cgi is with something like this:

  $dsn      = 'DBI:mysql:dbname=';
  $database = 'database';
  $user     = 'user';
  $passwd   = 'password';
  $path     = 'sql_reports.xls';

  $db_encoding     = 'utf-8';
  $xls_date_format = 'dd.mm.yyyy';
  $debug = 1;

SECURITY

There is none. Use apache auth modules if you need it.

Subversion checkout over ssh

So, you want to checkout source code on some server and at the same time have ability to commit or diff against local copy on your laptop? Seems like an easy task, but it does involve some unexpected steps (for me at least), so here is a quick how to...

Create ssh tunnel from target host back to your laptop (called llin in output). Edit .ssh/config and add something like:

Host server-dev.rot13.org server-dev
	Hostname server-dev.rot13.org
	RemoteForward 8022 127.0.0.1:22

You will notice that I added short name so I can type just ssh server-dev because I'm lazy.

When you login to server-dev you might think that something like svn checkout is everything that is left. However, that doesn't really work:

dpavlin@server-dev:~$ svn ls svn+ssh://localhost:8022/home/dpavlin/private/svn/SQL2XLS
ssh: localhost:8022: Name or service not known
svn: Connection closed unexpectedly

It seems that subversion doesn't like port number within hostname! So, let's make .ssh/config there also:

Host llin-svn
	Hostname localhost
	Port 8022

Let's try it out:

dpavlin@server-dev:~$ svn ls svn+ssh://llin-svn/home/dpavlin/private/svn/SQL2XLS
The authenticity of host 'localhost (127.0.0.1)' can't be established.
RSA key fingerprint is 00:11:22:33:44:55:66:77:88:99:aa:bb:cc:dd:ee:ff.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
Password or swipe finger:

Much better, but is asks us for password every time. We don't really like that, so we'll create ssh keys to get around this:

dpavlin@server-dev:~$ ssh-keygen -f .ssh/llin-svn
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in .ssh/llin-svn.
Your public key has been saved in .ssh/llin-svn.pub.
The key fingerprint is:
aa:bb:cc:dd:ee:ff:00:11:22:33:44:55:66:77:88:99 dpavlin@server-dev.rot13.org

Now we will insert generated .ssh/llin-svn.pub into .ssh/authorized_keys on laptop but allow only svnserver to be started:

command="svnserve -t" ssh-rsa AAA...rest of key...AAA== dpavlin@server-dev.rot13.org

If you want normal ssh login back to your laptop, you might leave out command="svnserve -t", but this makes me feel better. On the other hand, tunnel will be open only when we are logged into server-dev, but I usually prefer more security if possible. If you don't want to commit back to laptop, you might add -R flag to make repository read-only.

But wait, there is more! We need to tell ssh on server-dev that we are using newly generated key, so our final .ssh/config looks like this:

Host llin-svn
	Hostname localhost
	Port 8022
	IdentityFile ~/.ssh/llin-svn

We can test it now to make sure that subversion doesn't ask for password by simply checking out source code:

dpavlin@server-dev:~$ svn co svn+ssh://llin-svn/home/dpavlin/private/svn/SQL2XLS
Checked out revision 0.

Happy hacking!

Publish your data with Exhibit

As you might remember, back in 2007 I wrote about Exhibit which in meantime released version 2.0 and moved to google code.

This time, I made few more converters which enable you to:

Convert tab separated data into Exhibit with GD::Graph::sparklines inline graphs in table
Create Exhibit from simple SQL query [timeline of this blog as Exhibit from this SQL query]
Subversion as timeline [timeline of svn.rot13.org]

This is probably best test of JavaScript speed in your browser. Exhibit seems to work best with around 500 items in older browsers, but Firefox 3.1b2 works with 3000 objects, even on EeePC :-)

Request Tracker + Subversion + SVN::Web = microblogging

This might seem like a strange title, but let's look at it for a moment. If subversion commit messages carry magic marker (RT #1234 in my case) I want to create RT comment (micro-blog post) with link back to SVN::Web repository (and thus commit which created comment in first place). Nice and circular :-)

First, install subversion post-commit hook in /srv/svn/repository/hooks/post-commit:

#!/bin/sh

REPOS="$1"
REV="$2"

/srv/svn/svn-rt-comment.pl $REPOS $REV

After that, create following script:

#!/usr/bin/perl

use strict;
use warnings;

use RT::Client::REST;
use RT::Client::REST::Ticket;

# Request Tracker
my ( $server, $username, $password ) = ( 'https://bugs.example.com/rt', 'rtuser', 'rtpasswd' );
# patternt to recognize RT references in commits log or diff
my $rt_re = qr/rt\s*#\s*(\d+)/i;
my $svnweb = 'https://svn.example.com/svnweb/index.cgi/repository/revision/?rev=';

die "usage: $0 repo rev\n" unless @ARGV;

my ( $repo, $rev ) = @ARGV;

sub svnlook {
	my $command = shift;
	`svnlook $command --revision $rev $repo`
}

my $log = svnlook 'log';
my $diff = svnlook 'diff';

if ( $log =~ $rt_re || $diff =~ $rt_re ) {

	my $id = $1 or die "no id";

	my $rt = RT::Client::REST->new( server  => $server );
	$rt->login( username=> $username, password=> $password );
	my $ticket = RT::Client::REST::Ticket->new( rt  => $rt, id  => $id );

	my $message =
		svnlook 'author' . "\t" . svnlook 'date' . "\n" .
		$svnweb . $rev . "\n\n" . 
		svnlook 'changed --copy-info' . "\n" .
		$log
	;
	
	$ticket->comment( message => $message );
}

And apply following patch to your Request Tracker 3.6 to create links in messages:

--- /usr/share/request-tracker3.6/html/Ticket/Elements/ShowMessageStanza	2006-06-20 00:44:04.000000000 +0200
+++ /usr/local/share/request-tracker3.6/html/Ticket/Elements/ShowMessageStanza	2008-09-26 13:23:12.000000000 +0200
@@ -57,8 +57,12 @@
         my $content = $stanza->{raw};
         RT::Interface::Web::EscapeUTF8(\$content);
         $m->comp('/Elements/Callback', content => \$content, %ARGS);
-        $content =~ s{$}{
}mg
-            if defined $content;
+        if ( defined($content) ) {
+		# convert urls to links
+		$content =~ s{(https?://\S+)}{<a href="$1">$1</a>};
+		$content =~ s{$}{<br />}mg;
+		warn "## $content\n";
+	}
 
 
 <%$content |n%>

And you have your micro-blogging environment connected by linking :-)

Update: If you have problems using SVN::Web with Subversion 1.5 take a look at this ticket which includes svn-web-svn1.5.diff which fixes it.

Subversion downgrade and permission errors

We had a double-disk failure on RAID5 array last week (yes, it does happen). One of things that got affected was our source code repository.

So, we reinstalled etch and installed subversion, did svnadmin verify on repository and all seemed well until we tried to commit using SVN DAV. Apache's error log was full of errors like this:



Could not create activity /svn/repo/!svn/act/9526cf2d-4893-4b35-a843-4c0f2d9f8bc6. [500, #0]

could not open dbm files. [500, #120022]

Can't open activity db: APR does not understand this error code [500, #120022]

But, permissions where correct! In utter desperation, I even tried chmod 777 on repository (kids, don't try this at home!) but it didn't help.

What did help was:



mkdir repo.new

svnadmin create repo.new

svnadmin dump repo | svnadmin load repo.new

mv repo repo.old

mv repo.new repo

Checking created repository with diff showed that it was indeed changed:



Binary files repo.old/dav/activities and repo/dav/activities differ

diff -rw repo.old/db/current repo/db/current

1c1

< 59 r9 1

---

> 62 ra 1

Only in repo/db/revprops: 60

Only in repo/db/revprops: 61

Only in repo/db/revprops: 62

diff -rw repo.old/db/revs/10 repo/db/revs/10

4c4

< id: aq.0.r10/17

---

> id: e1.0.r10/17

6c6

< pred: aq.0.r8/1443323

---

> pred: e1.0.r8/1443490

16c16

< dir b0.0.r8/1443476

---

> dir av.0.r8/1443643

and so on, for pages and pages.

I don't feel very good about changes, but repository was created with newer version from backports, and this time we used version from etch (thus downgrading from 1.4.6 to 1.4.2).

The whole point of this post is to help some random Google user with downgrade of subversion. So, here you have it!

Sync part of subversion repository

I had a particular problem at work: we have upstream subversion repository which we access over ssh tunnel (using svn protocol) which contains two branches in which we are interested and various other stuff we don't care about (and don't want to mirror).

On other hand, we also wanted to have local copy of all changes (preserving history) and local commit messages and SVN::Web interface.

In original idea, I also wanted to keep revision numbers as-is (so I can just checkout our local version and be done), but this wasn't possible. One solution that we examined is to use Pushmi and make local copy, but we didn't want all the other changes.

Other idea was to use svndumpfilter to sync only two branches we are interested in (it will create dummy commits for revision which are outside our branches), but since branches are result of copy from parts of the tree we don't want to sync, it didn't work either.

Did I mentioned that our svn repository can access upstream only through carefully crafted ssh tunnels? Mess, right?

So, in the end, solution was hybrid:

make local copy of two upstream branches using svk (loosing original order of commits, even if we are commiting into same svk mirror copy at our side)
install post-commit hook in upstream repository which will call (over https) svk sync at our side (I would probably use SMTP to trigger that, but our machine with svn repository doesn't accept outside e-mail)
install local post-commit hook to send e-mail notifications

Rest of this post are instructions on how to do this. Since I learned a thing of two doing this, I hope it might be also useful for others.

First create svn-pull.sh shell script which will run under user which has ssh keys to login to upstream firewall (1.2.3.4 in this example) and setup tunnels to upstream svn server (10.1.1.1):

#!/bin/sh
ssh -L 13690:10.1.1.1:3690 1.2.3.4 sleep 2 &
pid=$!
SVKROOT=/home/user/.svk svk sync -a
kill $pid

Now setup mirrors of branches we care about:

svk mirror svn://127.0.0.1:13690/project/carnet-foo /project/foo
svk mirror svn://127.0.0.1:13690/project/carnet-bar /project/bar

This is all nice, but we need to trigger it from www-user which is done with following in /etc/sudoers:

www-data ALL=(user) NOPASSWD:/home/user/svn-pull.sh

and add simple cgi script which will trigger sync operation:

#!/bin/sh
echo -e "Content-type: text/plain\n\r\n\r"
sudo -u user /home/user/svn-pull.sh

I used ScriptAlias in apache to make it visible at https://svn-ours.example.com/upstream-svn-update. No need on obsucate URL, since it's behind SSL for added points. IP address limit might also be a good idea:

  <Location /upstream-svn-update>                   
        Order allow,deny
        Allow from 1.2.3.4
  </Location>

Now install post-commit hook in upstream repository. We care only for files which have /carnet in path since branches which we are interested have that prefix:

svn log -v -r $REV file://$REPOS | grep ' /carnet' 2>/dev/null \
    && wget -q -O /dev/null https://svn-ours.example.com/upstream-svn-update

You will notice that there are no locking or any other tweaks, since all tools have those capabilities anyway, so we are really just using RPC via cgi over https in fact.

Nice and easy, once you know how to do it! It seems like a few bits of configuration all over the place, but I hope that it employs KISS - keep it simple and stupid at it's best.

Update: OK, now we have local repository (with different revisions), but svn switch --relocate doesn't work because those repositories are not same (makes sense, eh?)

Following steps are quick explanation now to copy .svn directories from new repository:

cd /srv/carnet-foo
# update repository to last upstream version
svn update
# delete old .svn directories
find . -name ".svn" -exec rm -Rf {} \;
# checkout new repository
cd /srv
svn co svn://svn-ours.example.com/carnet-foo carnet-foo.new
# copy new .svn files to old repository
cd carnet-foo.new
find . -wholename "*/.svn/*" | cpio -pvd ../carnet-foo/
# cleanup
cd /srv
rm -Rf carnet-foo.new
# following shouldn't return any differences
cd carnet-foo
svn diff