Scalable applications with Gearman

If you talked with me in last years or so, you probably heard me mention queues as new paradigm in application development. If your background is web-development, you probably wondered why are they important. This blog will try to explain why they are useful and important, and how you can make your app scale, even on same box.

Problem was rather simple: I needed to make monitoring which will pull data from ~9000 devices using telnet protocol and store it in PostgreSQL. Normal way to solve this would be to write module which first checks if devices are available using something like fping and then telnet to each device and collect data. However, that would involve careful writing of puller, taking care of child processes and so on. This seemed like doable job, but it also seemed a bit complicated for task at hand.

So, I opted to implement system using Gearman as queue server, and leave all scaling to it. I decided to push all functionality in gearman workers. For that, I opted to use Gearman::Driver which allows me to easily change number of workers to test different configurations. Requirement was to pull each machine in 20-minute intervals.

Converting existing perl scripts which collect data into gearman workers was a joy. At first run (with 25 workers) it took 15 minutes to collect all data. Just by increasing number of workers to 100 we managed to cut down this time just over 1 minute. And that's on single core virtual machine (which makes sense, since most of the time we are waiting on network).

For web interface, I decided to use Mojolicious. But, to make it work with Gearman, I write MojoX::Gearman which allows me to invoke gearman functions directly from Mojolicious. In fact, all functionality of web interface is implemented as Gearman workers, even querying database :-)