Series: Creating Durable PHP

  • Building DPHP: What is it?

    Building DPHP: What is it?

    This entry is part 1 of 4 in the series Creating Durable PHP

    I’m fascinated by event-driven systems. In these systems, everything happens for a reason, which is just as important as its effect on the system. In my perfect universe, everything is built to be event-driven… but very few things deserve the complexity of creating such a thing.

    Durable PHP started as a solution that allowed arbitrarily complex software to be built quickly and just as fast to write tests for. I wanted it to force developers to write testable code while making the testing experience effortless.

    To do that, I took inspiration from some great libraries and frameworks I really enjoyed working with over the years: Akka, Durable Functions, and Dapr. I wanted to bring this to PHP and see if I could develop anything new. It’s mostly finished, and I’m currently bolting on new features before releasing the documentation.

    Before that happens, I’d love to spend a few blog posts introducing Durable PHP. For those who have used Durable Functions before, this will seem immensely familiar, as it has a very similar API—though that is where the similarity ends.

    There are three main components to any software written in Durable PHP:

    1. Activities,
    2. Orchestrations,
    3. and, Entities

    Activities

    Activities are very similar to “serverless functions” in that they have no state and no identity (technically, their result does have an identity, but we will get to that later), and they exist to cause side effects. In fact, Durable PHP uses activities to encapsulate and memoize side effects for use in Orchestrations.

    Activities can be any code that you want to run exactly once in an Orchestration, such as sending emails to users, generating a random password, calling an external API, etc.

    Orchestrations

    Orchestrations are Event Sourced code that runs linearly, even if the underlying events are non-linear. Technically, the code is run multiple times until a point is reached where the result is not yet known (async on steroids). If that point is reached, the current state of execution is serialized until something changes that may indicate more code can be run. Then it is run again, and again, until it terminates.

    For example, a “create project” Orchestration may send various components (through activities or sub-orchestrations) to create underlying infrastructure. These components may come up in various orders, but the code is still run linearly.

    Further, external clients can subscribe to an execution’s state. This allows for providing reactive UIs instead of using “optimistic concurrency tricks” to hide the fact that something is asynchronous.

    Entities

    Entities are the core state and behavior holders of an application built on Durable PHP. Entities can run any code, but it is guaranteed to be synchronous. These are very similar to Akka actors, in that you can do whatever you want in them. But like Durable Functions, you can only do one thing at a time in a single instance.

    Problems and Solutions

    While building Durable PHP, I had to solve a number of problems and the great research literature out there helped out immensely. In a few cases, I had to work out things using the good ‘ole noggin, and I’d like to share how I approached those problems in some blog posts:

    1. Event Sourcing done “right”
    2. Cooperative distributed locking
    3. Authz and Ownership
    4. Retries and failures
    5. The arrow of time
    6. Time travel debugging; tests
  • Building DPHP: Event Sourcing

    Building DPHP: Event Sourcing

    This entry is part 2 of 4 in the series Creating Durable PHP

    From the beginning, I knew Event Sourcing would be a big part of Durable PHP, and I learned from several Event Sourced projects I’ve built or worked on over the years. Specifically, I knew what issues any project would have to deal with, and I wanted to create a framework where none of those issues existed.

    I’ll go into the gist of Event Sourcing now, but feel free to skip to the next section if you think you know how it works.

    Event Sourcing is a paradigm of writing software where the state is computed by replaying events. For example, an inventory item isn’t created out of thin air, like in CRUD; rather, it is “received” into inventory from somewhere else. This “received event” allows us to calculate how many items were added to the inventory by counting those events. To remove items from inventory, we might “sell” them or “trash” them, allowing us to get the current inventory state by simply keeping track with a +1/-1 of items.

    Further, we might “project” the current state into a traditional database for faster querying or use in other parts of the software.

    There’s far more to this than meets the eye, so I highly suggest reading up on it if you are interested … but this is where we’re going to deviate from “application level” Event Sourcing.

    Layers

    In Durable PHP, Event Sourcing is done below the application level. You only interact with it tangentially in Entities and a bit more directly in Orchestrations. In other words, Durable PHP is a framework for Event Sourcing in PHP without even having to know that is what you are working with and how it works.

    One of the key aspects of Event Sourcing in Durable PHP is the layers. If you’ve worked with traditional software, you’ve probably heard of “application layers,” “persistence layers,” and so forth, responsible for various application lifecycles. In Durable PHP, there are different layers:

    1. Infrastructure: routing and authorization
    2. History: aggregation and locking
    3. Context: linearization and API
    4. Application: business logic

    The two layers that are very often kept as “secret” in any implementation of Event Sourcing are the middle layers: History and Context. In Durable PHP, the History layer keeps track of whether or not we’re replaying events, which events we’re waiting for, and which we’ve already processed. We’re also keeping events queued up for after a lock is released and helpful state for managing distributed locks.

    The Context layer provides a projected state to the Application layer so that the application code doesn’t need to know how it works or is stored. As far as the application code is concerned, it works just like any other PHP code.

    This alleviates the need for a (no)SQL database entirely since the application state appears to itself as though it is continuously running, even though it is only running when triggered to do so.

    Storage

    You’ll note that I mentioned that a database isn’t technically needed. The state is stored as blobs that can be loaded on demand and projected to TypeSense (or anything, really) if searching is required.

    Since the state is simply a collection of blobs, we can operate optimistically in certain contexts, using “Etags” to catch concurrency issues. We simply throw away the results and restart if another operation beats us to storage. In performance tests, this is much faster (>10x faster) than trying to take a distributed lock and only run one operation at a time. This technique also embraces out-of-order events and leaves figuring out what is going on to the Context layer. A distributed lock is the only way to proceed in some contexts where we expect an external side effect since we need exactly-once semantics.

    Even then, we can prevent side-effects for certain types of side-effects until we are sure the state has been committed. The gist of a state-commit looks something like this:

    1. Try to commit state
    2. If success: perform side-effects
    3. If failure: trash and rerun from the stored state

    From the outside, this looks like exactly-once semantics, but from the perspective of the code, it may many, many times.

    Composable Events

    One early optimization I did was to create composable events. There are only a few core events and many composable events. For example, any event may be delayed by wrapping it in a Delay event, providing addresses to send responses to, sharing ownership, including a lock request, and more. This greatly increases the expressibility of events while working on the system and allows for greater control when dealing with events in different layers. For example, the Infrastructure layer is concerned with delays and ownership, the History layer is concerned with locks, and the Context layer is concerned with who to send replies to.

    This, hands down, was one of the best early-game decisions made. I’ve seen plenty of Event Sourced code in my time, and the Event object always eventually ends up being this opaque blob of flags, and dealing with them is a spaghetti monster of if-statements. By using composable events, I ended up with very few if-statements, unwrapping the event one layer at a time.

    Challenges

    One of the biggest challenges when working on this was the interactions between the Context and History layers, such as waiting for multiple events and dealing with out-of-order events. I ended up with what I felt was a pretty elegant solution.

    We need to keep track of what we are waiting on and only care about those. Everything else can be ignored until we are waiting on those events. This also means that the event order isn’t related to the order it was received, but instead, the order is the order we expect them and the order we received them (if we expect multiple events). This can probably be optimized at some point, but it is unlikely to be optimized any time soon.

    This is super important for replaying code, as we want to be sure that code isn’t considered “replaying” simply because we received an out-of-order event.

    Another huge challenge was testing. The hardest part of any application to test is the interface between parts. You can test the contracts, but only end-to-end tests will test whether the contract is fulfilled correctly. Naturally, I didn’t want end-to-end tests with actual events, so I needed to build a custom system to inject events in whatever order I want. Then, verify things as it plays out. It was very tedious but worth it in the end. None of this is required for end users. However, to validate that things work correctly, it is very much required.

    Event Sourcing Done Right?

    I’m working on a rather fun application using Durable PHP; to find all the edge cases, polish up some of the rough edges, and make it a pleasure to use. Of all the Event Sourcing libraries and shenanigans I’ve ever used, I’m actually really happy with this. There’s still much more tooling to build, but I’m getting there, one weekend at a time.

  • Building DPHP: Distributed Locks

    This entry is part 3 of 4 in the series Creating Durable PHP

    Durable PHP is a framework for writing arbitrarily complex code that scales. It’s still under active development, but I thought I’d write a few blog posts about it before I release the documentation on how to use it. Today, I want to talk about distributed locking.

    In Durable PHP Orchestrations, you can lock Entities (as many as you want) in a single statement. This is ripe for a deadlock, especially if you have an Orchestration taking multiple locks on multiple Singleton Entities. Let’s imagine the following sequence diagram:

    If another Orchestration comes along and tries to lock, which one succeeds? How do we prevent a deadlock where neither Orchestration will make progress?

    This turns out to be a hard problem with a relatively simple solution by following only a few rules:

    1. If an Entity has a lock on it, queue all other requests until the lock is released.
    2. If an Orchestration takes a lock, the order must be deterministic.
    3. If an Orchestration terminates, it must release all locks.

    Further, we must not take a lock in our lock sequence until we receive acknowledgement that we have a lock. This looks something like this:

    By following these rather simple rules, we can prevent (most) deadlocks. I don’t believe they are entirely impossible, but I think for most cases, they’ll resolve themselves as long as people ensure there is a timeout on the lock.

    I experimented with many different implementations, and I’ll share them here.

    Cooperative Entities

    One of the first implementations I attempted was through composing lock events addressed to multiple entities and then having the entities cooperate to jointly create a lock. This turned out to effectively having the same behavior that I eventually went with. In fact, it was through observing the behavior and edge cases of this implementation that led me to the result. However, I believe this implementation, had I gone with it, would have been easier to optimize. The current implementation requires 2n messages per lock, where this original implementation could have been optimized to need only n+1 messages. However, I determined the resulting complexity would have simply been too much to easily maintain. It was simply too brittle.

    Future Optimizations

    Currently, there is infrastructure level locking that simply didn’t exist when I wrote the original code. Now, there are better ways to perform a lock that make more sense. A future optimization could take advantage of this locking mechanism to require only n messages to perform a lock. This would greatly speed up critical code paths that require locking.

  • Building DPHP: Authz and Ownership

    This entry is part 4 of 4 in the series Creating Durable PHP

    Durable PHP is a framework for writing arbitrarily complex code that scales. It’s still under active development, but I thought I’d write a few blog posts about it before I release the documentation on how to use it. Today, I want to talk about Authorization/Authentication and Ownership.

    In any software project, there are users. These users interact with resources, usually via CRUD (Create, Read, Update, and Delete). However, Durable PHP has a bit more fine-grained actions:

    • Start: creates a new orchestration
    • Signal: send an event somewhere, but no return address.
    • Call: send an event somewhere with a return address and be notified upon completion
    • Read: retrieve snapshots of distant state
    • Subscribe: be notified of status changes
    • Delete: just what it sounds like

    Durable PHP treats everything as a resource, whether it is an activity, entity, or orchestration and every resource must be owned by something or someone. The application developer is responsible for what those entities are and what the permissions are, which can be rather fine-grained.

    From there, the are only a few basic rules you need to know:

    1. A user can own a resource. They have full control over the resource.
    2. A user can share an owned resource with another user or role.
    3. A user can share Read/Update/Delete rights to a resource they own.
    4. A user can only interact with a resource if they have a right to do so.

    These rules are important to know because all code you write is operating in the context of the user, enforced by the infrastructure. This prevents a number of security concerns, where even if a user were to inject an ID of something they weren’t supposed to have access to, your code wouldn’t have access to it. This is much more similar to sandboxing that an OS does on a mobile device, where the user who owns the resource has to give you permission to access it.

    There is, however, a sudo mode for users of the role admin. These users are able to access anything, even if they don’t have permission to do so. This allows administrators to perform actions on behalf of other users, for performing customer support or other administrative tasks.

    One technique I’ve used is to have an entity owned by an administrator. Code that needs to run elevated operations calls the singleton entity and asks for permission. If it’s allowed, it will simply execute the code as an admin, otherwise, request a human admin to check if it is allowed and approve/disapprove it.

    For example, in the context of a chat app, an HR person might need to access private chats of a user. If, in the context of my app, they have the appropriate role, I would allow them to access that in the context of an admin; otherwise, I would elevate this to a human as this is either an application bug (the user seeing a button they shouldn’t see) or someone is doing something fishy.

    I’m very happy with this authorization system, as it is extremely clean (<120 lines of code) and quite flexible.