Last year, I had good fortune to get acquiented with great work which Open Library does. It's part of Internet Archive which itself is a library. So, libraries are not (yet) dead it seems. Brewster Kahle's Long Now Talk explains it much better than I can do, so take 90 minutes to listen to it.
It turs out that we have pictures in multiple formats (so sorting them required removing common prefix and using number only to get correct order), and most of are scanned images in pdf documents. Here are all types of documents which can be automatically collected into book for on-line browsing:
- images of scanned pages
- multi-file pdf file with single image per page
- single pdf file with one image for each page
- single pdf file with more than one (usually 4) horizontal bitmap strips for each page
- normal pdf documents which contain text and needs rendering to bitmap
Source code of my plack server for Internet Archive book reader is on github, so if you want to take a look, hop over there...