On Porting Beanstalkd to Scala

Beanstalkd is a notable and mature job queue. It’s written in C, rather sparsely commented, and moderately complicated. I wanted to add some functionality and discovered reading the code pretty tricky. The soundest way to learn what some system is doing is to rewrite it. At least for me.

So, I did.

In Scala.

I cut my teeth in porting software, by porting games from iOS to Android and Windows Mobile. My method is as follows:

  1. Write the software as close to 1:1 as can be. Retaining the original data structures as nearly as possible. Write tests confirming it works. If there’s an API, use actual clients to verify it’s operation is the same. (What I’m doing right now)
  2. Refactor the software to take advantage of native language libraries and idiomatic expressions.

I discovered porting the core of the program rather straight-forward (the queues/tubes), and as I was porting the protocol parsing, I mused about implementation details and algorithms.

In C, we don’t possess higher level structures like Scala, Python, or PHP. Instead, things are left to the writer to implement. The author of Beanstalkd produced many practical implementations that may be hard to make idiomatic.

For example, they used a (heap) priority queue for storing jobs and a doubly linked list for walking the queue. They have a weird circular buffer that behaves like a queue (expected) but implemented like a hash table so that they can remove and insert things as needed. In higher level languages, we don’t usually write these things by hand. We still use them though, through abstractions like Lists and Arrays. We then store the items in them and access them that way.

It’s this paradigm difference that strikes me as something worth writing a post. In a lot of PHP code, you’ll see arrays passed around. In C, you might only pass the head of a (double) linked list around. It is a not-so-subtle difference. Passing an array in PHP is cheap — it passes a pointer to the array with a flag to copy it if you make any changes, AKA copy-on-write. Passing a double linked list in PHP is just as cheap, but it might be hard to understand.

I’m pleased with the abstractions that higher level languages provide, but this exercise has prompted me to question if we might be over-reliant on them. There exist times where a linked list, a custom buffer, red-black tree, or some other low-level data structure might be more fitting.

Numerous people tend to contend that it’s: “reinventing the wheel,” “too low level,” or “too risky,” but I disagree. To use the argument from “Goto Considered Harmful Considered Harmful”:

It is like butchers banning knives because workers sometimes cut themselves. — Frank Rubin

We shouldn’t be hesitant to implement low-level structures in our high-level code if it makes sense and especially if it makes the work more readable. When it doesn’t accomplish those things, we shouldn’t merge it.

It’s very rarely that a low-level structure can beat the performance implemented at a lower level in the language. So, if you do end up using low-level structures, readability should be the reason.

Want an inside scoop?

Check out PHP Shenanigans for only €5/mo or €30/yr