Results tagged “MojoFacets”

I have spent few last weeks with my head down, adding persistence and changes tracking to Mojo Facets, turning it into much more interesting alternative to web-based data stores like DabbleDB. Idea was simple: I had all data in memory, I should be able to edit it, right?

Well, as it always turns out, if was about three weeks of development, but it moved Mojo Facets into much more interesting use case of making small tweaks to your input data.
Problem is how to keep those changes? Mojo Facets is never master provider for data so saving some kind of audit log which can be applied back on master data is of paramount importance. After all, if you edit that data, you might want to apply those changes back when you re-generate source file or pull new version from some other system.

First idea was to add simple audit log which records all requests in Mojo. I decided to call requests with parameters actions and store them on disk under /tmp. All I had to add was re-submit form in browser and a bit of interface around it. Same form with all parameters can turn Mojo Facets into peer-to-peer application: I just added checkbox which can change destination URL in action to another Mojo Facets installation and I got nice replication of actions to another instance.

But, all was not well. Editing data in browser generates update to specific entry in your dataset, so I decided also to record changes which include old and new field value, and all unique keys for this dataset.

This seems like such a small statement, but getting it up to point where you can load some data, edit it in browser and than apply that changes back on original data (after reboot) or on different dataset with same unique field.
Even better, it should be possible to apply changes log to master data. I prefer to think of it as a replication log to another system.

To integrate better with other systems, filters got export (and import) option which dumps them in simple, one line per entry text file which is accessible over http. It's perfect format it you want to quickly xargs that data into another script, for example to generate more source data with something as simple as:

 cat /srv/mojo_facets/public/export/isi-cro2.js/filter.autor.119 | \
 xargs -i ./bin/isi-download-results.pl 'CA={}'

Speaking of more dataset sources, I also added parsing on html tables, which should allow users at some point to just drop multiple tables from results page into single directory and load them as dataset. Your telebanking doesn't have export you need? No problem! Just save all pages to disk and you are ready to do.
Right now, table parsing needs a bit of heuristics to be really useful. It searches for table on page with correct number of columns, and has support for extracting of header or repeating first row (normal <td>) for column names.

All that would be unusable without profiling to turn it really snappy. This is first time I used Google Chrome for serious development, and while I still dislike it's inspector (firebug's dom inspector is much nicer to me), Speed Tracer extension was very useful for front-end part including network throuput and html/javascript overhead. On server side, I used Devel::NYTProf, and I will talk about it at Slobodni Fastival 4 in Čakovec, so see you there...

My server side faceted browser just got a bit better. In fact, it become 10 times better. But, let's try to explain this story step by step...

This week I will try to introduce faceted joins. Primary motivation is great Plants For A Future database which consists of more than one text file.

Use case is something like following:
I would like to know all plants which can have medical use, are edable and have perennial habitat (so I don't have to re-plant them every year).

And you can watch the video to see how easily this can be done:

But, this still doesn't make MojoFacets 10 times better than before. This is quite small dataset (still about 10 times bigger than Exhibit could handle), but I had new problem: 100Mb source file a bit less than 30000 items. To make it scale more I implemented pre-calculated filters and sorts. They serve same usage as indexes do in relational databases, but they are calculated on demand and stored in memory.

Let's see in action how does it work with ~30000 items:

In this video, we saw:

  • starting memory usage of ~13Mb
  • 100Mb dataset with 29869 items
  • filter by autor with 45644 taking ~10s
  • use regex filter ic,
  • godina_izdavanja is numeric facet
  • jezik filter using cro slo ser regexps and toggle it
  • show popup title on filters
  • turn off filters to show under 4s load time
  • at the end, we consumed ~260Mb of memory
Ok, 4s might not seem blazingly fast, but have in mind that all this is implemented in pure perl (so deployment is lightweight) using Mojolicious web framework. But it has it's overhead. Other than 260Mb or RAM for browser, it will also take 600Mb of RAM memory for server side. But, if you can live with 6* file size factor server side this might be very interesting as a faceted browsing tool for the web.

I am huge fan of Exhibit faceted browsing of data. However, Exhibit is implemented in JavaScript within your browser and that makes it unusable for larger amounts of data (more than 300 or so). In my case, 3800 elements is unusably slow even in latest Chrome or Firefox.

Something had to be done. If JavaScript inside browser isn't up to the task, you might wonder what would happen if you moved processing back to server side, and use browser just for task which they are good at: displaying generated HTML pages.

So, let me introduce MojoFacets - server-side facet browser based on Mojolicious with a bit of jQuery UI.

Although it seems strange to promote server-side solutions in 2010, this approach still makes sense. For just 40Mb of memory usage server side (including Mojo and dataset) you can get fast and usable facet browsing.