Edit MARC records in your favourite editor

MARC is binary structured file format used in libraries and occasionally you need a way to quickly edit records in it.

There are quite a few tools available for editing MARC records, but none of them are really easy to use. Few days ago, I overheard conversation in which somebody said you can't just edit in your editor, because you have to fix leader field afterward. That got me thinking...

MARC format is really quite interesting, designed to be easily used on tapes, so it has a bunch of pointers to exact record positions which change if size of field values change.

Editing binary MARC directly in editor isn't really possible because you would have to fix leader (which has size of whole record which allows easy skipping on tape) and all pointers to start of each field manually. I did some of that while writing MARC::Fast, but I wouldn't recommend that to anyone.

MARC::Record already has utility called marcdump which will convert MARC into human readable format:

dpavlin@t61p:/srv/webpac2$ marcdump t/data/marc.iso > dump.txt
t/data/marc.iso
dpavlin@t61p:/srv/webpac2$ head dump.txt 
LDR 01002nam0 2200289   45  
001     800328901
100    _a19800328d1999       y0hrvy0191    ba
101 1  _ahrv
102    _aHR
105    _aa   m   000yy
200 1  _aMjerne nesigurnosti rezultata mjerenja odstupanja od oblika i polo`aja
       _emagistarski rad
       _fZdravko Bar{i}
       _gvoditelj Vedran Mudronja

I can easily edit this in my favorite editor (which is vim if you really wanted to know) to replace CROASCII characters with utf-8 equivalents of something like that.

So, I wrote dump2marc which parse marcdump output and converts it back to MARC file format like this:

dpavlin@t61p:/srv/webpac2$ ./bin/dump2marc.pl dump.txt > dump.marc
It will also dump parsed records to STDERR which you can redirect to file using 2>dump.txt.parsed.

If you dump generated MARC with marcdump you can use diff to view changes:

dpavlin@t61p:/srv/webpac2$ ./bin/dump2marc.pl dump.txt > dump.marc \
  && marcdump dump.marc > dump.marc.txt \
  && diff -u dump.txt dump.marc.txt

If we wanted to track changes, add dump file (and MARC if needed) to git and using it to produce diffs would be one possible solution. I know that by this point, you are already rolling your eyes thinking: this is command line editing for MARC.

Same work-flow could be turned into web application using simple upload button to upload MARC, and then opening textarea with dump which is converted back to MARC ready to download after submit. It would be sane to automatically add every revision to git on server-side and provide some kind of rollback to some last version over git too, maybe over gitweb?