I have been following Tokyo Cabinet for a while, and I was especially keen to try full-text indexes which where added recently. I'm actually so obsessed with it that I had Google alert set on words "Tokyo Cabinet" and apart from occasional political event in Tokyo's political cabinet, it was useful in finding interesting information when I got link to this blog post how to create simple intranet search by Mikio Hirabayashi which include easy to follow instructions to make local intranet search, complete with web crawler (in ruby).
Unfortunatly, Google translate isn't really kind to it and creates something which is not really usable. But, I managed to condense it to following script:
#!/bin/sh -x url=https://blog.rot13.org/ test -f intra.tsv || ruby wgettsv -allow "$url.*html" -deny cgi -max 10000 $url > intra.tsv tctmgr inform tctsearch.tct tctmgr importtsv tctsearch.tct intra.tsv tctmgr setindex -it qgram tctsearch.tct title tctmgr setindex -it qgram tctsearch.tct body...which is really awesome if you ask me. All the good things of Tokyo Cabinet with a little bit on
qgram
index on the top.
I will try to summarize blog post here to provide English speaking web public an opportunity to find out more if they can read Japanise
Concept is really simple:
...and very nicely split into components. If you still didn't give Tokio Cabinet a try! If you want, you can take a look at my Tokyo Cabinet scripts as a starting point. I really need to make proper Debian packages for recent versions, so watch this space...