For those who know me, I’ve usually been a pretty big fan of WordPress. I even worked for Automattic for a time. I have quite a collection of custom plugins and code to do some pretty interesting things. But now, I think it is time to go; for many reasons.
(more…)Category: Uncategorized
-
Yes, another test
I’m so sorry…
-
This is another test
Sorry for the noise, but emails, social links, and the WordPress reader should be working again… 😉
Stay tuned for the magic.
-
This is a test, and only a test
I’ve moved to a headless CMS, and this is a test to make sure everything still works.
Nothing to see here… except emails are probably broken.
-
Algorithms in PHP: Deques (and circular buffers + linked lists)
If you’ve ever tried to use an array as a queue or stack (via
array_pop
,array_push
,array_unshift
andarray_shift
), you may have discovered they are slow, particularly array_unshift which requires shifting the entire array. Here are the limits of the awesome PHP array: prepending. Prepending to an array is a very costly operation in PHP, and there isn’t really any way to get around it, or is there?Queues and Stacks
In computer science, there are two fundamental things for batch processing: queues and stacks. A queue is a first-in-first-out (FIFO) construct, and it works just like a queue at a grocery store. The first person in the line is the first person who gets checked out.
A stack, on the other hand, is a first-in-last-out (FILO) construct. You can imagine this just like a stack of plates. As you “push” plates onto the stack, the first plate you place is the last plate you’ll “pop” off the stack.
Both of these structures are amazing for being able to do any kind of bulk work. However, sometimes you must utilize a sequence as a stack and a queue. For example, if you require storing the last 500 items of something, but want to iterate it in reverse (undo features come to mind here).
The Deque
Enter the deque (pronounced “deck”). Deques are slightly magical in being able to use them like a queue and a stack. Taking items off and on from either end is O(1) while accessing the middle is still O(n). They are usually implemented on top of linked lists or a circular buffer.
A deque is incredibly powerful for “undo” features. You push operations onto the deque and shift items at the bottom once it gets above a specific size, allowing you to easily “undo the last X actions.” So, if you ever see some requirements for a “limited size stack” or “limited size queue,” you’re probably looking to use a deque.
A short disclaimer
Before we get into implementation of a Deque, I feel like a disclaimer is needed. The code shown on these posts shouldn’t be used in production. I’m not doing a lot of error handling, and in some cases, ignoring edge cases altogether. I’m merely trying to illustrate the principles for those cases where you need to implement it yourself. For example, WordPress implements a priority queue similar to the last post to support filters/actions in order to support insertion/deletion while iterating the queue. Beanstalkd uses a custom heap sort to support
Nieve implementation
First, let us look at a very nieve implementation:
<?php namespace Withinboredom\Deque; class Nieve { private array $buffer = []; /** * Append to the end of the array * * @param mixed $value * * @return void */ public function push( mixed $value ): void { array_push($this->buffer, $value); } /** * Get from the end of the array * @return mixed */ public function pop(): mixed { return array_pop($this->buffer); } /** * Get from the beginning of the array * @return mixed */ public function shift(): mixed { return array_shift($this->buffer); } /** * Append to the beginning of the array * * @param mixed $value * * @return void */ public function unshift( mixed $value ): void { array_unshift($this->buffer, $value); } }
As you can see, it is pretty simple. You should note that it is also very slow and gets slower as the array gets bigger.
Don’t do this unless you only have a very small list (~100-200 items).
Implementing a circular buffer
We can use a regular array, just like our nieve implementation, but implement a circular buffer. This would make the major cost of our implementation be the overhead of array accesses. To implement a circular buffer, we must understand what a circular buffer is and its requirements.
I’m not going to go into depth on how a regular one works, so I highly recommend pausing here and learning about regular circular buffers. We need a special circular buffer: a double-ended circular buffer.
This works similarly to a regular circular buffer, except we have two cursors: the “head” and the “tail.” These allow us to insert items at “the front” and “the back” while also allowing us to read those items.
To add an item to the head of the buffer:
- Check if the deque is full.
- If we are already at our initial position, move to the end of the array.
- Else decrement the pointer.
- Add the item to the array.
To add an item at the tail of the buffer:
- Check if the deque is full.
- If we are already at the end of the array, move to the beginning of the array.
- Else increment the pointer.
- Add the item to the array.
-
Algorithms in PHP: Deques (circular buffers & linked lists)
In the previous post, I talked about priority queues. But in my explanation of an alternative implementation, I neglected to mention a few things:
- The alternative implementation isn’t for production; there are edge cases and unhandled errors.
- The alternative implementation is meant for a short queue, not an infinite one, such as the one you may deal with in a single request.
- The performance of the alternative implementation may be misleading, in the worst case, it would perform far worse than the STL implementation. However, in my experience, priorities tend to bunch up vs. complete random priorities.
In this post, we’re going to go into Deques (pronounced “decks”), how they are usually implemented, and an alternative implementation that is specific to PHP. If you aren’t aware, PHP is getting a native Deque implementation in 8.2. A Deque is a double-ended queue that you can insert and remove from either end.
Deques are extremely powerful anytime you require a “queue of a bounded length” where the older items are dropped (like an undo buffer). However, this is one of those things where you really shouldn’t implement this in native PHP. STL Queues and Stacks can be used as a Deque and perform far better than anything you can implement in PHP. The reason for this should be evident if you’ve ever tried to prepend to an array; it is slow and gets slower the more items you add to the array.
A naive implementation
A naive PHP implementation of a Deque would probably look something like this:
<?php namespace Withinboredom\Algorithms\Deque; class Naive { private array $buffer = []; /** * Append to the end * * @param mixed $value * * @return void */ public function push(mixed $value): void { array_push($this->buffer, $value); } /** * Get from the end of the array * @return mixed */ public function pop(): mixed { return array_pop($this->buffer); } /** * Get from the beginning * @return mixed */ public function shift(): mixed { return array_shift($this->buffer); } /** * Append to the beginning * * @param mixed $value * * @return void */ public function unshift(mixed $value): void { array_unshift($this->buffer, $value); } }
This naive implementation is so bad that I wouldn’t use it unless I dealt with less than a few dozen items. Here’s a chart of how bad it gets as the number of items gets larger:
Deques using circular buffers
Highly performant Deques use circular buffers to accomplish their goals, along with sophisticated logic to automatically grow and shrink the buffers to allow for an infinite number of items. You can imagine a circular buffer as being a clock-type structure. If you were to insert 12 items, it would look like a clock. For a Deque, we’d want two “pointers” to represent the “head” and the “tail” of our Deque, much like the hands on a clock.
We’ll imagine the hour hand to be our “head” and the minute hand to be our “tail” while discussing insertions and deletions. To prepend, we move the hour hand back and insert our new value. To append, we move the minute hand forward and insert our value. To remove from the head, we take the value from the hour hand and move it forward. To remove the tail, we take the value from the minute hand and move it backward.
We aren’t going to cover growing or shrinking circular buffers here to keep it simple.
So, let’s build our class:
<?php namespace Withinboredom\Algorithms\Deque; class CircularBuffer { private array $buffer = []; private int $head = -1; private int $tail = 0; public function __construct(private int $size) { } }
First thing, we need a buffer to store items in, a head/tail pointer, and a max-size of the buffer. The reason for the strange head/tail initial values is to give us an easy way to tell whether we’re inserting an initial value or whether or not the buffer is full. Here’s how that works:
public function isFull(): bool { return ($this->head === 0 && $this->tail === $this->size - 1) || $this->head === $this->tail + 1; } public function isEmpty(): bool { return $this->head === -1; }
Push
To push onto the buffer, we need to handle several states:
public function push(mixed $value): void { if ($this->isFull()) { throw new \Exception(); } if ($this->head === -1) { $this->head = 0; $this->tail = 0; } elseif ($this->tail === $this->size - 1) { $this->tail = 0;just a tiny } else { $this->tail += 1; } $this->buffer[$this->tail] = $value; }
- our initial state
- wrapping around the circular buffer
- appending normally
Pop
To pop, we also need to handle several states:
public function pop(): mixed { if ($this->isEmpty()) { throw new \Exception(); } $return = $this->buffer[$this->tail]; if ($this->head === $this->tail) { $this->head -= 1; $this->tail -= 1; } elseif ($this->tail === 0) { $this->tail = $this->size - 1; } else { $this->tail -= 1; } return $return; }
- when the buffer is full
- wrapping around the circular buffer
- popping normally
Shift and Unshift
Shifting and unshifting are handled the same, but in reverse:
public function shift(): mixed { if ($this->isEmpty()) { throw new \Exception(); } $return = $this->buffer[$this->head]; if ($this->head === $this->tail) { $this->head -= 1; $this->tail -= 1; } elseif ($this->head === $this->size - 1) { $this->head = 0; } else { $this->head += 1; } return $return; } public function unshift(mixed $value): void { if ($this->isFull()) { throw new \Exception(); } if ($this->head === -1) { $this->head = 0; $this->tail = 0; } elseif ($this->head === 0) { $this->head = $this->size - 1; } else { $this->head -= 1; } $this->buffer[$this->head] = $value; }
The downside to this implementation is that a pre-determined size limits it, however, it is still pretty performant:
It’s a bit slower than the STL implementation (~86 vs. ~50 microseconds with a list size of 2,000). There are still several things to implement even to be close to being on par with the STL implementation:- ArrayAccess to access arbitrary elements
- Growing/shrinking the circular buffer
Linked Lists
Another potential possibility to traditionally implement a Deque is to use a doubly linked list. To accomplish that, we’ll need a “Node” class to represent each item in the list:
<?php namespace Withinboredom\Algorithms\Deque; class Node { public function __construct(public Node|null $previous, public Node|null $next, public mixed $value) { } }
We can also add a few helpful behaviors:
setNext()
,setPrevious()
,resetNext()
,resetPrevious()
,removeMeNext()
andremoveMePrevious()
.setNext()
andsetPrevious()
set the$next
/$previous
variable and return$this
:public function setNext(Node $next): Node { $this->next = $next; return $this; } public function setPrevious(Node $previous): Node { $this->previous = $previous; return $this; }
resetNext()
andresetPrevious()
both unset next/previous:public function resetPrevious(): Node { $this->previous = null; return $this; } public function resetNext(): Node { $this->next = null; return $this; }
removeMeNext()
andremoveMePrevious()
deletes itself from the chain and returns the next/previous Node:public function removeMeNext(): Node|null { $return = $this->next?->resetPrevious(); $this->resetNext(); return $return; } public function removeMePrevious(): Node|null { $return = $this->previous?->resetNext(); $this->resetPrevious(); return $return; }
From there, the operations a pretty straightforward:
class LinkedList { public Node|null $head = null; public Node|null $tail = null; /** * Append to the tail * * @param mixed $value * @return void */ public function push(mixed $value): void { $this->tail = $this->tail?->setNext(new Node($this->tail, null, $value))->next ?? $this->head = new Node(null, null, $value); } /** * Get from the end of the array * * @return mixed */ public function pop(): mixed { $return = $this->tail?->value ?? throw new \Exception(); $this->tail = $this->tail->removeMePrevious(); return $return; } /** * Prepend * * @return mixed */ public function unshift(mixed $value): void { $this->head = $this->head?->setPrevious(new Node(null, $this->head, $value))->previous ?? $this->tail = new Node(null, $this, $value); } /** * Get from the beginning of the array * * @return mixed */ public function shift(): mixed { $return = $this->head?->value ?? throw new \Exception(); $this->head = $this->head->removeMeNext(); return $return; } }
The performance is linear, just like before:
However, it still performs worse than a circular buffer or any STL implementation.
Php Specific implementation
If you’ve had to work extensively with PHP arrays, you’d probably guess why the naive implementation is so terrible. It’s not that bad, as long as you never unshift the array. Shifting/unshifting to a PHP array has terrible performance implications and should be avoided as much as possible. If you a bulk-prepending, it is usually faster to reverse the array, append your values, then flip the array again.
To avoid prepending to an array, we will use two arrays and only append to them. We’ll call these arrays “front” and “back”; appending to the “front” array is like prepending, and appending to the “back” array is just like regular appending. We pull from the end of each array to take from the “front” and “back,” respectively. Once the appropriate array is empty, we shift from the beginning of the other array until it is empty, which is still incredibly performant.
This implementation is really for when you need something custom, and the other types of algorithms wouldn’t work.
<?php namespace Withinboredom\Algorithms\Deque; class DoubleList { private array $front = []; private array $back = []; }
To prepend, we append to
$front
, and to append; we append to$back
:public function unshift(mixed $value): void { $this->front[] = $value; } public function push(mixed $value): void { $this->back[] = $value; }
When we pop, we check to see if
$back
is empty; if it is, we take it from the start of $front. Otherwise, we take it from the back of$back
:public function pop(): mixed { if (empty($this->back)) { // take the first item from the front and remove the item return array_shift($this->front); } return array_pop($this->back); }
We similarly do the reverse for the shift:
public function shift(): mixed { if (empty($this->front)) { // take the first item from the back and remove the item return array_shift($this->back); } return array_pop($this->front); }
This allows us to get around prepending by appending, almost as performant as the circular buffer implementation.
STL and when to use it
The STL implementations of Stacks and Queues can be used as Deque. Their performance (wall-clock time) will beat the crap out of these implementations; alternatively, if you are targeting PHP 8.2+, using the native Deque would be an even better solution — though we don’t know what that performance will look like yet.
For 99% of cases, you won’t need to implement your own Deque or any other basic algorithms. However, there are times when implementing your own can allow more business freedoms. For example, Beanstalkd implements its Priority Queue to do some things, such as removing items from the middle of the queue and adding a TTL for jobs. WordPress implements its Priority Queue (actions/filters) to support removal/insertion while iterating the queue and preventing infinite loops. Many other projects also end up embedding certain logic into these algorithms to support business logic in a performant way.
DS\Deque
There’s also the DS\Deque class, which is a proper Deque implementation; however, it requires being installed separately and does not come bundled with PHP. It is a little faster than using the STL libraries as a Deque. If you are looking for raw speed, this is the one you’re looking for.
When to use these algorithms
As I mentioned, sometimes you need to implement low-level algorithms to support high-performance business cases. It’s unlikely, but knowing how they work and some PHP-specific implementations can significantly impact when it matters.
-
Algorithms in PHP: Priority Queues (and Heaps)
This is the beginning of a series on implementing various algorithms in idiomatic PHP and their reference implementation (usually based on C++).
Today we’re going to be talking about Priority Queues. A priority queue is where each value has an associated “priority” that determines which order things come out of the queue. The best real-life priority queue would be an emergency room, where those patients with the most life-threatening issues are seen before the rest.
In most reference implementations, a priority queue is implemented using a Heap. A Heap is a binary tree-based structure that retains a specific value at the top of the tree. If the value is the max value of all the values, it is a “Max Heap,” and if it is the least, it is a “Min Heap.” Since we have a concept of “low priority” and “high priority,” we’ll consider a “Max Heap” here.
When you add an item to a Heap, you append it to the end of the Heap. If the newly appended value is lower than the parent, nothing needs to be done, else swap it with the parent and continue the check from that point. There are plenty of excellent articles on implementing a Heap, and we only need them to implement the “reference” implementation; for the idiomatic version of a priority queue, we don’t need a stinking Heap. 🙂
Interface
We’re only going to be implementing a few public functions:
<?php namespace Withinboredom\Algorithms\Heap; interface Heapish { public function insert(int $priority, mixed $value): void; public function removeMax(): mixed; public function getSize(): int; public function getMax(): mixed; }
insert
obviously inserts a new value with a given priority,removeMax
returns the value with the highest priority and removes it from the queue, andgetMax
just peeks at the highest priority value without removing it from the queue.The reference implementation
<?php namespace Withinboredom\Algorithms\Heap; /** * A basic priority heap implemented in C++-ish semantics */ class Reference { private int $heapSize = 0; private array $arr = []; public function __construct(public readonly int $maxSize) { } public function getMax(): mixed { return $this->arr[0]['value']; } public function getSize(): int { return $this->heapSize; } public function insert(int $key, mixed $value): void { if ($this->heapSize === $this->maxSize) { throw new \Exception("Heap is full"); } $i = $this->heapSize++; $this->arr[$i] = ['key' => $key, 'value' => $value]; while ($i !== 0 && $this->arr[$this->parent($i)]['key'] < $this->arr[$i]['key']) { $this->swap($i, $this->parent($i)); $i = $this->parent($i); } } private function parent(int $i): int { return (int)floor(($i - 1) / 2); } private function swap(int $first, int $second) { $temp = $this->arr[$first]; $this->arr[$first] = $this->arr[$second]; $this->arr[$second] = $temp; } public function deleteKey(int $i) { $this->increaseKey($i, PHP_INT_MAX); $this->removeMax(); } public function increaseKey(int $i, int $newKey) { $this->arr[$i]['key'] = $newKey; while ($i !== 0 && $this->arr[$this->parent($i)]['key'] < $this->arr[$i]['key']) { $this->swap($i, $this->parent($i)); $i = $this->parent($i); } } public function removeMax(): mixed { if ($this->heapSize <= 0) { throw new \Exception("Heap is empty"); } if ($this->heapSize === 1) { $this->heapSize--; return $this->arr[0]; } $root = $this->arr[0]; $this->arr[0] = $this->arr[--$this->heapSize]; $this->maxHeapify(0); return $root; } public function maxHeapify(int $i): void { $l = $this->leftChild($i); $r = $this->rightChild($i); $largest = $i; if ($l < $this->heapSize && $this->arr[$l]['key'] > $this->arr[$i]['key']) { $largest = $l; } if ($r < $this->heapSize && $this->arr[$r]['key'] > $this->arr[$largest]['key']) { $largest = $r; } if ($largest !== $i) { $this->swap($i, $largest); $this->maxHeapify($largest); } } private function leftChild(int $i): int { return 2 * $i + 1; } private function rightChild(int $i): int { return 2 * $i + 2; } }
As you can see, the implementation is not very “php-ish” and does some pretty standard-ish things with arrays. However, this is based on an implementation in C++. In C++, arrays are very different than in PHP! In PHP, arrays can be sparse, sorted, or unsorted, and more importantly, we can access the underlying iteration mechanics to store state!
Looking over this implementation, you may notice a few other drawbacks. One, this implementation needs to be sorted on every insert, which incurs a
O(log n)
cost, and is also incurred when removing the max item from the Heap. Let’s apply some PHP shenanigans to get it much faster…Utilizing the underlying cursor
The first optimization we can make is to utilize the array’s cursor and not sort on every insert. To do that, we need to keep track of whether or not an item has been inserted or not. So we’ll create a
$dirty
variable, so we only need to sort right before we read from the queue. This will significantly simplify our inserts. Oh, since we don’t have to worry about array sizing, we can also toss out a potential exception:public function insert(int $key, mixed $value): void { $this->arr[$key][] = $value; $this->dirty = true; }
Note that we append to an array with the value. This allows us to have multiple values with the same priority and retrieve them in the same order we insert them (FIFO — first in, first out), just like with a regular queue (our reference implementation does a FILO — first in, last out — implementation which doesn’t seem right to me).
Deletion can now be simplified as well:
public function deleteKey(int $i): void { unset($this->arr[$i]); }
We don’t need to mark the array as dirty because it is already sorted or will be sorted when it is read. For simplicity, we aren’t concerned with multiple values here, and will likely be an application-level decision on how to handle that.
We also don’t need to keep track of the size of the queue anymore; we can just use the built-in function:
public function getSize(): int { return count($this->arr); }
To get the max priority value, we first need to check if the queue is dirty. If it is dirty, we have to sort it. Then return the item:
public function getMax(): mixed { if ($this->dirty) { krsort($this->arr); $this->dirty = false; } $next = current($this->arr); return $next[0]; }
It is very similar to removing the item from the queue as well; however, we have to rebuild the inner queue for a given priority if there is more than one item in the queue:
public function removeMax(): mixed { if ($this->getSize() === 0) { throw new \Exception("Heap is empty"); } if ($this->dirty) { krsort($this->arr); $this->dirty = false; } $current = key($this->arr); $return = current($this->arr); if (count($return) === 1) { unset($this->arr[$current]); return $return[0]; } $data = $return[0]; unset($return[0]); $this->arr[$current] = array_values($return); return $data; }
You may be tempted to use an internal pointer for the little queues, but it will only save you a few microseconds unless you have very dense priorities (lots of values with the same priority).
public function removeMax(): mixed { if ($this->getSize() === 0) { throw new \Exception("Heap is empty"); } if ($this->dirty) { krsort($this->arr); $this->dirty = false; } $current = key($this->arr); $x = &$this->arr[$current]; $value = current($x); $next = next($x); if($next === false) { unset($this->arr[$current]); } return $value; }
Here’s the complete listing:
<?php namespace Withinboredom\Algorithms\Heap; /** * An idiomatic PHP implementation of a heap. * * We are using a cursor here because we are utilizing the magic of sparse arrays! */ class Phpized { private array $arr = []; private bool $dirty = false; public function __construct() { } public function insert(int $key, mixed $value): void { $this->arr[$key][] = $value; $this->dirty = true; } public function deleteKey(int $i): void { unset($this->arr[$i]); } public function removeMax(): mixed { if ($this->getSize() === 0) { throw new \Exception("Heap is empty"); } if ($this->dirty) { krsort($this->arr); $this->dirty = false; } $current = key($this->arr); $x = &$this->arr[$current]; $value = current($x); $next = next($x); if($next === false) { unset($this->arr[$current]); } return $value; } public function getSize(): int { return count($this->arr); } public function getMax(): mixed { if ($this->dirty) { krsort($this->arr); $this->dirty = false; } $next = current($this->arr); return $next[0]; } }
And a simple unit test:
<?php use Withinboredom\Algorithms\Heap\Phpized; it('inserts and removes items in the order they are inserted', function () { $heap = new Phpized(); for ($i = 0; $i < 1000; $i++) { $heap->insert(12, $i); } $heap->insert(15, 10); $heap->insert(0, 12); expect($heap->removeMax())->toBe(10); for ($i = 0; $i < 1000; $i++) { expect($heap->removeMax())->toBe($i); } expect($heap->removeMax())->toBe(12); expect($heap->getSize())->toBe(0); expect($heap->removeMax(...))->toThrow(\Exception::class); });
Comparisons
I used PHP Bench to microbenchmark the two implementations alongside PHP’s SPL implementation. Naturally, the benchmarks are useless in their raw form but valuable for comparison to each other.
+----------------+-----------+----------+-----------+-----------+-----------+ | subject | mode | stdev | best | worst | mem_peak | +----------------+-----------+----------+-----------+-----------+-----------+ | benchReference | 998.044μs | 42.644μs | 950.360μs | 1.275ms | 931.560kb | | benchSpl | 301.795μs | 6.961μs | 294.900μs | 326.080μs | 574.280kb | | benchPhpized | 275.860μs | 7.220μs | 266.890μs | 302.150μs | 574.280kb | +----------------+-----------+----------+-----------+-----------+-----------+
What we can see is that this “phpized” implementation of a priority queue is faster than the C-based implementation or at least just as fast, even utilizing the same amount of memory. These implementations were run with the CLI opcache turned on, JIT enabled, and with dense priorities.
PHP can be faster than you expect
PHP arrays are downright powerful, allowing you to simplify some traditionally complex code, even totally abandoning traditional algorithms altogether in some instances (hint: there’s more to come!).
This is just the beginning of this series, and I hope you’ve found it as exciting and enlightening as I did building it! If you have any questions or suggestions, feel free to ask them in the comments!