While ago, I started writing MojoFacets, fast web-based faceted browser which keeps data in-memory. Since I last wrote blog post about it I added various features to it turning it into powerful spreadsheet-like application within browser in which you can mangle your data using perl code.
Let me start with a list of new features:
- run perl snippet over filtered subset of data, modifying columns (using $update) or creating aggregated result (using $out)
- format on-screen filter html with hidden ;, so that copy/paste into spreadsheet produce correct values and counts
- export dataset as tab separated values for easy migration into other applications
- use tab separated export and optional time format string with gnuplot to produce png graphs from data (this works well for huge datasets)
- export filtered values from facets in simple one-value per line format
- run perl snippets over filter's facet values to easily select ($checked) or calculate something with $value or $count
- import of CSV files (with optional encoding specified in filename)
- import from CouchDB databases or view
- import SQL query results from RDBMS using any DBI perl module (tested with PostgreSQL, mysql and SQLite)
- switch between loaded data-sets easily (filters are already shared between them, allowing poor man's join)
- implement lookup into different data-set with descriptive statistics on values
Adding perl code evaluation over dataset was logical extension since I already had web interface written in perl which had all data in memory. To make it fast, I had to implement indexes (and invalidation). But small things, like automatic generation of meaningful names for code snippets in form of dependent_col1,dep_col2.result_col1,res_col2 turned read-only web interface into powerful tool for application of reusable code snippets on data.
Latest feature is lookup to other datasets with ability to create multiple columns from lookup values. MojoFacets is now sufficiently advanced to replace relational database for quick join-like problems, but this time by writing a little snippet of perl looking like this:
lookup($row->{issn}, 'nabava', => 'issn', sub { my $stat = shift; push @{ $update->{listprice} }, $on->{listprice}; $stat->add_data( $on->{listprice} ); },sub { my $stat = shift; $update->{price_min} = $stat->min; $update->{price_max} = $stat->max; });This will lookup using $row->{issn} into nabava dataset using issn column. First sub is executed for each value in lookup date (using $on hash) and second one is executed once to create aggregates using $stat which is Statistics::Descriptive object. Second sub is optional if you don't need aggregates.
If you found Google Refine and wanted something similar, but in perl (with perl as query/mangling language) MojoFacets might be good choice.