When to Roll Your Own

Libraries are amazing. No, not the place where you go and get books for free (though, those are amazing too!), but software libraries. Libraries let you “skip” a bunch of work and go straight to the meat and potatoes of your software. If you are writing scheduling software, you can get an open-source library to render the calendar, then plug in your scheduling magic and BAM, you are done. Right?

Not quite.

Every library you add is potential technical debt. Sure, you get to “skip” that implementation, but one day, you’ll eventually run into the limitations of at least one library. When that happens you’ll be faced with a choice: do I contribute a new feature to the library, maintain an internal fork (licenses allowing), or do I roll my own library?

A lot of teams get hung up on this question. It doesn’t seem like there is a simple solution … but I think there is. Let me give a practical example:

I’ve been running my home-grown Swytch Framework in production for a while now, on several small projects. One of the cool things about the framework is that you write really simple components (just like React), and it handles all the escaping and safety, magically, right out of the box.

Most other template engines (like Twig, for example), also handle escaping … halfway. In other words, you have to tell Twig to escape an attribute if you are outputting an attribute, or HTML if you are outputting HTML, or js, if you output js (the default is HTML mode, at least in Twig).

How this works, is that it parses the HTML as the component returns it, inserts any children (if defined), etc. Since it is parsing the HTML, it knows the escaping context to use. For example, in “plain-ole-PHP”, this is insecure:

<Login redirectTo="{<?= $_GET['rd'] ?? '/dashboard' ?>}"></Login>

But in the Swytch Framework, the redirectTo attribute is escaped before rendering the Login component.

Now, this is fantastic … but there was a problem with parsing the HTML. Firstly, pretty much every off-the-shelf HTML5 parser written in PHP is a validating parser. That means this:

<!-- note the single quote instead of double quotes -->
<a href='/home'>Hello world</a>

Would get transformed into this:

<!-- note the single quote instead of double quotes -->
<a href="/home">Hello world</a>

Because that is what a proper HTML5 parser does under the hood (according to the spec). However, that would also mean it would transform things to be “proper” HTML which would break in the browser (funny enough) because … well, because PHP isn’t a browser. We can get into the nitty gritty, but these libraries are built for validating HTML that you paste into a form, or scraping sites … not for rendering HTML in a browser.

And here I was, faced with fixing a library where patches are unlikely to be accepted upstream, maintaining my own fork, or building a parser from scratch.

The Heuristics

When trying to answer this question, I think first about the project I’m working on. Does my use case fit within the boundaries and vision of the library I’m using? In other words, if I were to make the changes I wanted, how hard would it be to get those changes merged in?

Then I think about the library itself: is it well-written and easy to extend? Are the maintainers around, like are they responsive to issues and pull requests? If I were to maintain a fork, would the community benefit from making the fork public? Do I want to become the “defacto maintainer” because the library is practically abandoned?

Then finally, I have a pretty good idea of what is out there and how the library works. If I’ve made a decision to ditch the library, I have two choices (usually the same amount of work, to be honest): roll my own or implement a different library — assuming there is another library…

Also, at this point, we have a really good idea of our requirements, so selecting another library is pretty straightforward if it is a possibility.

If no other library can be selected, then it is time to roll your own solution.

In my case, I ended up implementing a streaming parser that is about as fast (within the margin of error) as the library it is replacing. I was able to maintain the exact same API, with very few modifications. However, it is 1000x more flexible. I’m able to do things that were simply impossible to do in the original (like replace entire tokens while parsing, write components for “special” HTML tags instead of hardcoding behavior in the tokenizer, etc). In the end, not only is it more flexible, but more maintainable. It reads much like the parsing spec, so updating it if the spec changes should be ridiculously easy.

You may have noticed that this is pretty much a ‘last ditch’ decision, but it shouldn’t be a hard one. Sometimes, when you’re working on a personal project or whatever, it makes sense to start with rolling your own … but when you’re trying to do something serious … don’t make the decision lightly, but also don’t be afraid of making it in the first place.