Migration of Informix UNL dump files using SQL

I had an interesting problem at my hand today: a directory with Informix dump in UNL format from which I had to extract data for migration into new non-relational system (into MARC format and not into NoSQL, btw). Idea was simple: let's import dump back into relational database, write SQL queries which produce data and use that. However, SQL standard doesn't really allow us to relax and expect everything to work. In fact...

Step 1: import into PostgreSQL

First idea was to use my favorite database, PostgreSQL and import data into it. First problem was schema file which used DATETIME HOUR TO MINUTE which i decided to convert into TEXT. There was another column with only date, so I will have to mungle this using SQL anyway.

But then I hit several roadblocks:

ERROR:  insert or update on table "xxx" violates foreign key constraint "xxx_fkey"
ERROR:  invalid input syntax for integer: ""
ERROR:  invalid input syntax for type date: ""
ERROR:  value too long for type character(xx)
ERROR:  date/time field value out of range: "13.02.1997"

They are all somewhat worrying for system which maintains your data, but I couldn't really influence quality of data in dump files from Informix, so I decided to try something which is more relaxed with errors like this...

Step 2: relax import using MySQL

Well, most of invalid input syntax should be ignored by MySQL, however:

ERROR 1074 (42000) at line 55: Column length too big for column 'xxx' (max = 255); use BLOB or TEXT instead

was a show stopper. I really don't want to hand-tune schema just to create throw-away queries to export data.

Step 3: SQLite - it takes anything!

In the process, I learned that I can't really blindingly import data, and that format has backslash on end of line for multi-line values, so I decided to write a small perl script which will import Informix UNL dumps directly into SQLite.

I'm generating INSERT INTO table VALUES (...) SQL directly, so you could easily modify this to run on some other database or just produce SQL dumps. For speed of import, I'm creating temporary database in /dev/shm. This helps sqlite3 to be CPU bound as opposed to disk-bound for import operation, and whole database is just 50Mb (UML dumps are 44M so it's very reasonable).

Not bad for less then 100 lines of perl code: working Informix UML loader into SQLite!