Building DPHP: Distributed Locks

This entry is part 3 of 4 in the series Creating Durable PHP

Durable PHP is a framework for writing arbitrarily complex code that scales. It’s still under active development, but I thought I’d write a few blog posts about it before I release the documentation on how to use it. Today, I want to talk about distributed locking.

In Durable PHP Orchestrations, you can lock Entities (as many as you want) in a single statement. This is ripe for a deadlock, especially if you have an Orchestration taking multiple locks on multiple Singleton Entities. Let’s imagine the following sequence diagram:

If another Orchestration comes along and tries to lock, which one succeeds? How do we prevent a deadlock where neither Orchestration will make progress?

This turns out to be a hard problem with a relatively simple solution by following only a few rules:

  1. If an Entity has a lock on it, queue all other requests until the lock is released.
  2. If an Orchestration takes a lock, the order must be deterministic.
  3. If an Orchestration terminates, it must release all locks.

Further, we must not take a lock in our lock sequence until we receive acknowledgement that we have a lock. This looks something like this:

By following these rather simple rules, we can prevent (most) deadlocks. I don’t believe they are entirely impossible, but I think for most cases, they’ll resolve themselves as long as people ensure there is a timeout on the lock.

I experimented with many different implementations, and I’ll share them here.

Cooperative Entities

One of the first implementations I attempted was through composing lock events addressed to multiple entities and then having the entities cooperate to jointly create a lock. This turned out to effectively having the same behavior that I eventually went with. In fact, it was through observing the behavior and edge cases of this implementation that led me to the result. However, I believe this implementation, had I gone with it, would have been easier to optimize. The current implementation requires 2n messages per lock, where this original implementation could have been optimized to need only n+1 messages. However, I determined the resulting complexity would have simply been too much to easily maintain. It was simply too brittle.

Future Optimizations

Currently, there is infrastructure level locking that simply didn’t exist when I wrote the original code. Now, there are better ways to perform a lock that make more sense. A future optimization could take advantage of this locking mechanism to require only n messages to perform a lock. This would greatly speed up critical code paths that require locking.

Series Navigation<< Building DPHP: Event SourcingBuilding DPHP: Authz and Ownership >>