Using Lua for Our Most Critical Production Code

By Brian Maher
Jan 27, 2016

Today, installing Distelli Agent is as simple as downloading a single executable. This same executable can be used to install the agent with the appropriate supervisor process, or to upload a new release. The agent is also portable across platforms, and runs with a very small CPU and memory footprint (~10 megabytes of memory and less than 1% of CPU).

At least partial credit for this performance goes to our move from Python to Lua.

Challenges with Python

I joined Distelli in December 2014. At that time, Distelli Agent and the command line tool were implemented as a couple of tarballs that would use the system version of Python (we supported versions 2.4-2.7). I had never seriously used Python before joining Distelli, so there was a bit of a learning curve. For example, I wrote and tested my code with Python 2.7, but then I ended up introducing incompatibilities with older versions of Python when I tried using a finally block with an exception block—that syntax was not available until version 2.6. Around this time, we received a support ticket from someone who tried to use Python 3.x; would we need to start supporting that version soon too?

There were also times when I wished that I could use a native Python library. For example, I needed to add support for detecting the Ethernet addresses of the host computer, which can be done with the netifaces native library (or shell out to ifconfig and parse the results, which would not be compatible with Windows).

The original agent implementation also broke up deployments into discrete steps, and would run some of the steps with supervisord. Unfortunately, this made the code slow and awkward.

Overall, I could see that the strategy of relying on the system’s version of Python was not good. The code was ripe for a rewrite.

Evaluating Lua

Lua has always been a fascinating language to me. I’m very much into “light and fast” in all facets of my life. I’m passionate about running, mountaineering, climbing, and motorcycles. In all these pursuits, shaving a pound of weight from your equipment (or stomach) can make a big difference. Lua strives to be the “light and fast” programming language. Mike Pall’s LuaJit project first caught my interest when it was at the top of a computer benchmarks site. It showed just how fast Lua can be, and the resulting build artifacts are less than a megabyte on all platforms (thus showing how light the language is).

I like that the language strips everything down into the minimal set of functionality required, without forcing conformance. For example, Lua gives you metatables that allow for operator overloading and object-oriented programming, all in one simple mechanism. It also supports coroutines that can be used to implement “green” threads.

However, it’s easy to be light when your standard library is composed of less than 200 functions. How are you supposed to get any work done if you don’t have an XML parser, JSON parser, HTTP(S) library, etc.?

Getting Stuff Done

The solution is to use the C libraries from Lua. C is a first-class citizen in Lua, and Lua’s interface with C is brilliant and well documented. From there, LuaJit takes it a step further and allows you to use the Foreign Function Interface to write your interface to C code in Lua code.

The Lua standard library uses the POSIX C library, making it portable; but that also means that you can’t use sockets, it doesn’t support signals or timers, and all OS calls block the thread. Using the POSIX C library is like going backwards in time.

Enter libuv. Libuv is a fully asynchronous, multi-platform library that was originally written to support node.js but is now being used to power many languages. Tim Caswell (aka creationix) wrote a binding to this library called “luv”. He then went on to create a lightweight “base” executable, called “luvi”, that is composed of various C libraries with Lua interfaces. All of this can be linked statically into a binary that is less than 5 megabytes and includes libuv, zlib, openssl, pcre, and, of course, the Lua runtime with bindings to the aforementioned libraries.

You can use luvi to run Lua code, but it also comes with built-in support for bundling your code directly into the binary. This makes it super-easy to ship your Lua code. I found out about this project in November 2014, and started following its progress.

The Move from Python to Lua

In February 2015, I began work on adding “live” logs to the agent. We had two primary implementation ideas at the time. We could always stream the logs to our servers—whether or not they were actually used—and then allow our web interface to query these intermediate data stores. Alternatively, we could simply allow the agent to respond directly to these requests and cut out the cost of maintaining an intermediate data store. We decided to go with the latter solution.

The current agent was polling about every 10 seconds for deployments, but to support live logs, we needed something that would respond faster. This was a great opportunity to introduce luvi. So, I wrote a little program that would persistently connect to our backend servers, and would allow our web interface to make direct requests to the agents. This project was successful and allowed us to gain confidence in building with luvi. However, forcing customers to run three processes persistently and concurrently (python supervisord, python agent, luvi) did not sit well with us—we all wanted the footprint of the agent to be minimal.

In March 2015, we acquired a customer that required us to support Windows. I thought this would be the perfect opportunity (excuse?) to port the agent code to Lua, and on May 1st, I had the first version of the rewritten agent available. We have been evolving and improving it ever since.

Over the course of the six months that followed, we ported the entire Python CLI and Agent code base to Lua. We have been able to quickly iterate to version 3.59, which now allows building software and encrypting critical data.

Conclusion

Was this the right decision? Overall, I think so. For one, installing agent is now much more straightforward—just download a single executable.

The runtime footprint of the agent is pretty minimal, typically using ~10 megabytes of memory and less than 1% of CPU. When switching from the 2.x version of the agent to 3.x, the deployments are noticeably faster.

The agent is also easier to maintain: using native libraries simply means we update luvi. For example, we are now bundling our fork of luvi with libyaml and libsqlite3. The complexities and time of building C code is confined to the occasional update of the luvi code base or adding a new native library. Once the base luvi is built, creating new bundles is super fast; it simply involves compiling and zipping the lua code into the final executable.

And finally, it helped us make Distelli Agent portable, which worked to our advantage when we needed to build the agent for SmartOS: