For those who know me, I’ve usually been a pretty big fan of WordPress. I even worked for Automattic for a time. I have quite a collection of custom plugins and code to do some pretty interesting things. But now, I think it is time to go; for many reasons.
(more…)Category: Uncategorized
-
Yes, another test
I’m so sorry…
-
This is another test
Sorry for the noise, but emails, social links, and the WordPress reader should be working again… 😉
Stay tuned for the magic.
-
This is a test, and only a test
I’ve moved to a headless CMS, and this is a test to make sure everything still works.
Nothing to see here… except emails are probably broken.
-
Algorithms in PHP: Deques (and circular buffers + linked lists)
If you’ve ever tried to use an array as a queue or stack (via
array_pop
,array_push
,array_unshift
andarray_shift
), you may have discovered they are slow, particularly array_unshift which requires shifting the entire array. Here are the limits of the awesome PHP array: prepending. Prepending to an array is a very costly operation in PHP, and there isn’t really any way to get around it, or is there?Queues and Stacks
In computer science, there are two fundamental things for batch processing: queues and stacks. A queue is a first-in-first-out (FIFO) construct, and it works just like a queue at a grocery store. The first person in the line is the first person who gets checked out.
A stack, on the other hand, is a first-in-last-out (FILO) construct. You can imagine this just like a stack of plates. As you “push” plates onto the stack, the first plate you place is the last plate you’ll “pop” off the stack.
Both of these structures are amazing for being able to do any kind of bulk work. However, sometimes you must utilize a sequence as a stack and a queue. For example, if you require storing the last 500 items of something, but want to iterate it in reverse (undo features come to mind here).
The Deque
Enter the deque (pronounced “deck”). Deques are slightly magical in being able to use them like a queue and a stack. Taking items off and on from either end is O(1) while accessing the middle is still O(n). They are usually implemented on top of linked lists or a circular buffer.
A deque is incredibly powerful for “undo” features. You push operations onto the deque and shift items at the bottom once it gets above a specific size, allowing you to easily “undo the last X actions.” So, if you ever see some requirements for a “limited size stack” or “limited size queue,” you’re probably looking to use a deque.
A short disclaimer
Before we get into implementation of a Deque, I feel like a disclaimer is needed. The code shown on these posts shouldn’t be used in production. I’m not doing a lot of error handling, and in some cases, ignoring edge cases altogether. I’m merely trying to illustrate the principles for those cases where you need to implement it yourself. For example, WordPress implements a priority queue similar to the last post to support filters/actions in order to support insertion/deletion while iterating the queue. Beanstalkd uses a custom heap sort to support
Nieve implementation
First, let us look at a very nieve implementation:
<?php namespace Withinboredom\Deque; class Nieve { private array $buffer = []; /** * Append to the end of the array * * @param mixed $value * * @return void */ public function push( mixed $value ): void { array_push($this->buffer, $value); } /** * Get from the end of the array * @return mixed */ public function pop(): mixed { return array_pop($this->buffer); } /** * Get from the beginning of the array * @return mixed */ public function shift(): mixed { return array_shift($this->buffer); } /** * Append to the beginning of the array * * @param mixed $value * * @return void */ public function unshift( mixed $value ): void { array_unshift($this->buffer, $value); } }
As you can see, it is pretty simple. You should note that it is also very slow and gets slower as the array gets bigger.
Don’t do this unless you only have a very small list (~100-200 items).
Implementing a circular buffer
We can use a regular array, just like our nieve implementation, but implement a circular buffer. This would make the major cost of our implementation be the overhead of array accesses. To implement a circular buffer, we must understand what a circular buffer is and its requirements.
I’m not going to go into depth on how a regular one works, so I highly recommend pausing here and learning about regular circular buffers. We need a special circular buffer: a double-ended circular buffer.
This works similarly to a regular circular buffer, except we have two cursors: the “head” and the “tail.” These allow us to insert items at “the front” and “the back” while also allowing us to read those items.
To add an item to the head of the buffer:
- Check if the deque is full.
- If we are already at our initial position, move to the end of the array.
- Else decrement the pointer.
- Add the item to the array.
To add an item at the tail of the buffer:
- Check if the deque is full.
- If we are already at the end of the array, move to the beginning of the array.
- Else increment the pointer.
- Add the item to the array.
-
Algorithms in PHP: Deques (circular buffers & linked lists)
In the previous post, I talked about priority queues. But in my explanation of an alternative implementation, I neglected to mention a few things:
- The alternative implementation isn’t for production; there are edge cases and unhandled errors.
- The alternative implementation is meant for a short queue, not an infinite one, such as the one you may deal with in a single request.
- The performance of the alternative implementation may be misleading, in the worst case, it would perform far worse than the STL implementation. However, in my experience, priorities tend to bunch up vs. complete random priorities.
In this post, we’re going to go into Deques (pronounced “decks”), how they are usually implemented, and an alternative implementation that is specific to PHP. If you aren’t aware, PHP is getting a native Deque implementation in 8.2. A Deque is a double-ended queue that you can insert and remove from either end.
Deques are extremely powerful anytime you require a “queue of a bounded length” where the older items are dropped (like an undo buffer). However, this is one of those things where you really shouldn’t implement this in native PHP. STL Queues and Stacks can be used as a Deque and perform far better than anything you can implement in PHP. The reason for this should be evident if you’ve ever tried to prepend to an array; it is slow and gets slower the more items you add to the array.
A naive implementation
A naive PHP implementation of a Deque would probably look something like this:
<?php namespace Withinboredom\Algorithms\Deque; class Naive { private array $buffer = []; /** * Append to the end * * @param mixed $value * * @return void */ public function push(mixed $value): void { array_push($this->buffer, $value); } /** * Get from the end of the array * @return mixed */ public function pop(): mixed { return array_pop($this->buffer); } /** * Get from the beginning * @return mixed */ public function shift(): mixed { return array_shift($this->buffer); } /** * Append to the beginning * * @param mixed $value * * @return void */ public function unshift(mixed $value): void { array_unshift($this->buffer, $value); } }
This naive implementation is so bad that I wouldn’t use it unless I dealt with less than a few dozen items. Here’s a chart of how bad it gets as the number of items gets larger:
Deques using circular buffers
Highly performant Deques use circular buffers to accomplish their goals, along with sophisticated logic to automatically grow and shrink the buffers to allow for an infinite number of items. You can imagine a circular buffer as being a clock-type structure. If you were to insert 12 items, it would look like a clock. For a Deque, we’d want two “pointers” to represent the “head” and the “tail” of our Deque, much like the hands on a clock.
We’ll imagine the hour hand to be our “head” and the minute hand to be our “tail” while discussing insertions and deletions. To prepend, we move the hour hand back and insert our new value. To append, we move the minute hand forward and insert our value. To remove from the head, we take the value from the hour hand and move it forward. To remove the tail, we take the value from the minute hand and move it backward.
We aren’t going to cover growing or shrinking circular buffers here to keep it simple.
So, let’s build our class:
<?php namespace Withinboredom\Algorithms\Deque; class CircularBuffer { private array $buffer = []; private int $head = -1; private int $tail = 0; public function __construct(private int $size) { } }
First thing, we need a buffer to store items in, a head/tail pointer, and a max-size of the buffer. The reason for the strange head/tail initial values is to give us an easy way to tell whether we’re inserting an initial value or whether or not the buffer is full. Here’s how that works:
public function isFull(): bool { return ($this->head === 0 && $this->tail === $this->size - 1) || $this->head === $this->tail + 1; } public function isEmpty(): bool { return $this->head === -1; }
Push
To push onto the buffer, we need to handle several states:
public function push(mixed $value): void { if ($this->isFull()) { throw new \Exception(); } if ($this->head === -1) { $this->head = 0; $this->tail = 0; } elseif ($this->tail === $this->size - 1) { $this->tail = 0;just a tiny } else { $this->tail += 1; } $this->buffer[$this->tail] = $value; }
- our initial state
- wrapping around the circular buffer
- appending normally
Pop
To pop, we also need to handle several states:
public function pop(): mixed { if ($this->isEmpty()) { throw new \Exception(); } $return = $this->buffer[$this->tail]; if ($this->head === $this->tail) { $this->head -= 1; $this->tail -= 1; } elseif ($this->tail === 0) { $this->tail = $this->size - 1; } else { $this->tail -= 1; } return $return; }
- when the buffer is full
- wrapping around the circular buffer
- popping normally
Shift and Unshift
Shifting and unshifting are handled the same, but in reverse:
public function shift(): mixed { if ($this->isEmpty()) { throw new \Exception(); } $return = $this->buffer[$this->head]; if ($this->head === $this->tail) { $this->head -= 1; $this->tail -= 1; } elseif ($this->head === $this->size - 1) { $this->head = 0; } else { $this->head += 1; } return $return; } public function unshift(mixed $value): void { if ($this->isFull()) { throw new \Exception(); } if ($this->head === -1) { $this->head = 0; $this->tail = 0; } elseif ($this->head === 0) { $this->head = $this->size - 1; } else { $this->head -= 1; } $this->buffer[$this->head] = $value; }
The downside to this implementation is that a pre-determined size limits it, however, it is still pretty performant:
It’s a bit slower than the STL implementation (~86 vs. ~50 microseconds with a list size of 2,000). There are still several things to implement even to be close to being on par with the STL implementation:- ArrayAccess to access arbitrary elements
- Growing/shrinking the circular buffer
Linked Lists
Another potential possibility to traditionally implement a Deque is to use a doubly linked list. To accomplish that, we’ll need a “Node” class to represent each item in the list:
<?php namespace Withinboredom\Algorithms\Deque; class Node { public function __construct(public Node|null $previous, public Node|null $next, public mixed $value) { } }
We can also add a few helpful behaviors:
setNext()
,setPrevious()
,resetNext()
,resetPrevious()
,removeMeNext()
andremoveMePrevious()
.setNext()
andsetPrevious()
set the$next
/$previous
variable and return$this
:public function setNext(Node $next): Node { $this->next = $next; return $this; } public function setPrevious(Node $previous): Node { $this->previous = $previous; return $this; }
resetNext()
andresetPrevious()
both unset next/previous:public function resetPrevious(): Node { $this->previous = null; return $this; } public function resetNext(): Node { $this->next = null; return $this; }
removeMeNext()
andremoveMePrevious()
deletes itself from the chain and returns the next/previous Node:public function removeMeNext(): Node|null { $return = $this->next?->resetPrevious(); $this->resetNext(); return $return; } public function removeMePrevious(): Node|null { $return = $this->previous?->resetNext(); $this->resetPrevious(); return $return; }
From there, the operations a pretty straightforward:
class LinkedList { public Node|null $head = null; public Node|null $tail = null; /** * Append to the tail * * @param mixed $value * @return void */ public function push(mixed $value): void { $this->tail = $this->tail?->setNext(new Node($this->tail, null, $value))->next ?? $this->head = new Node(null, null, $value); } /** * Get from the end of the array * * @return mixed */ public function pop(): mixed { $return = $this->tail?->value ?? throw new \Exception(); $this->tail = $this->tail->removeMePrevious(); return $return; } /** * Prepend * * @return mixed */ public function unshift(mixed $value): void { $this->head = $this->head?->setPrevious(new Node(null, $this->head, $value))->previous ?? $this->tail = new Node(null, $this, $value); } /** * Get from the beginning of the array * * @return mixed */ public function shift(): mixed { $return = $this->head?->value ?? throw new \Exception(); $this->head = $this->head->removeMeNext(); return $return; } }
The performance is linear, just like before:
However, it still performs worse than a circular buffer or any STL implementation.
Php Specific implementation
If you’ve had to work extensively with PHP arrays, you’d probably guess why the naive implementation is so terrible. It’s not that bad, as long as you never unshift the array. Shifting/unshifting to a PHP array has terrible performance implications and should be avoided as much as possible. If you a bulk-prepending, it is usually faster to reverse the array, append your values, then flip the array again.
To avoid prepending to an array, we will use two arrays and only append to them. We’ll call these arrays “front” and “back”; appending to the “front” array is like prepending, and appending to the “back” array is just like regular appending. We pull from the end of each array to take from the “front” and “back,” respectively. Once the appropriate array is empty, we shift from the beginning of the other array until it is empty, which is still incredibly performant.
This implementation is really for when you need something custom, and the other types of algorithms wouldn’t work.
<?php namespace Withinboredom\Algorithms\Deque; class DoubleList { private array $front = []; private array $back = []; }
To prepend, we append to
$front
, and to append; we append to$back
:public function unshift(mixed $value): void { $this->front[] = $value; } public function push(mixed $value): void { $this->back[] = $value; }
When we pop, we check to see if
$back
is empty; if it is, we take it from the start of $front. Otherwise, we take it from the back of$back
:public function pop(): mixed { if (empty($this->back)) { // take the first item from the front and remove the item return array_shift($this->front); } return array_pop($this->back); }
We similarly do the reverse for the shift:
public function shift(): mixed { if (empty($this->front)) { // take the first item from the back and remove the item return array_shift($this->back); } return array_pop($this->front); }
This allows us to get around prepending by appending, almost as performant as the circular buffer implementation.
STL and when to use it
The STL implementations of Stacks and Queues can be used as Deque. Their performance (wall-clock time) will beat the crap out of these implementations; alternatively, if you are targeting PHP 8.2+, using the native Deque would be an even better solution — though we don’t know what that performance will look like yet.
For 99% of cases, you won’t need to implement your own Deque or any other basic algorithms. However, there are times when implementing your own can allow more business freedoms. For example, Beanstalkd implements its Priority Queue to do some things, such as removing items from the middle of the queue and adding a TTL for jobs. WordPress implements its Priority Queue (actions/filters) to support removal/insertion while iterating the queue and preventing infinite loops. Many other projects also end up embedding certain logic into these algorithms to support business logic in a performant way.
DS\Deque
There’s also the DS\Deque class, which is a proper Deque implementation; however, it requires being installed separately and does not come bundled with PHP. It is a little faster than using the STL libraries as a Deque. If you are looking for raw speed, this is the one you’re looking for.
When to use these algorithms
As I mentioned, sometimes you need to implement low-level algorithms to support high-performance business cases. It’s unlikely, but knowing how they work and some PHP-specific implementations can significantly impact when it matters.
-
Algorithms in PHP: Priority Queues (and Heaps)
This is the beginning of a series on implementing various algorithms in idiomatic PHP and their reference implementation (usually based on C++).
Today we’re going to be talking about Priority Queues. A priority queue is where each value has an associated “priority” that determines which order things come out of the queue. The best real-life priority queue would be an emergency room, where those patients with the most life-threatening issues are seen before the rest.
In most reference implementations, a priority queue is implemented using a Heap. A Heap is a binary tree-based structure that retains a specific value at the top of the tree. If the value is the max value of all the values, it is a “Max Heap,” and if it is the least, it is a “Min Heap.” Since we have a concept of “low priority” and “high priority,” we’ll consider a “Max Heap” here.
When you add an item to a Heap, you append it to the end of the Heap. If the newly appended value is lower than the parent, nothing needs to be done, else swap it with the parent and continue the check from that point. There are plenty of excellent articles on implementing a Heap, and we only need them to implement the “reference” implementation; for the idiomatic version of a priority queue, we don’t need a stinking Heap. 🙂
Interface
We’re only going to be implementing a few public functions:
<?php namespace Withinboredom\Algorithms\Heap; interface Heapish { public function insert(int $priority, mixed $value): void; public function removeMax(): mixed; public function getSize(): int; public function getMax(): mixed; }
insert
obviously inserts a new value with a given priority,removeMax
returns the value with the highest priority and removes it from the queue, andgetMax
just peeks at the highest priority value without removing it from the queue.The reference implementation
<?php namespace Withinboredom\Algorithms\Heap; /** * A basic priority heap implemented in C++-ish semantics */ class Reference { private int $heapSize = 0; private array $arr = []; public function __construct(public readonly int $maxSize) { } public function getMax(): mixed { return $this->arr[0]['value']; } public function getSize(): int { return $this->heapSize; } public function insert(int $key, mixed $value): void { if ($this->heapSize === $this->maxSize) { throw new \Exception("Heap is full"); } $i = $this->heapSize++; $this->arr[$i] = ['key' => $key, 'value' => $value]; while ($i !== 0 && $this->arr[$this->parent($i)]['key'] < $this->arr[$i]['key']) { $this->swap($i, $this->parent($i)); $i = $this->parent($i); } } private function parent(int $i): int { return (int)floor(($i - 1) / 2); } private function swap(int $first, int $second) { $temp = $this->arr[$first]; $this->arr[$first] = $this->arr[$second]; $this->arr[$second] = $temp; } public function deleteKey(int $i) { $this->increaseKey($i, PHP_INT_MAX); $this->removeMax(); } public function increaseKey(int $i, int $newKey) { $this->arr[$i]['key'] = $newKey; while ($i !== 0 && $this->arr[$this->parent($i)]['key'] < $this->arr[$i]['key']) { $this->swap($i, $this->parent($i)); $i = $this->parent($i); } } public function removeMax(): mixed { if ($this->heapSize <= 0) { throw new \Exception("Heap is empty"); } if ($this->heapSize === 1) { $this->heapSize--; return $this->arr[0]; } $root = $this->arr[0]; $this->arr[0] = $this->arr[--$this->heapSize]; $this->maxHeapify(0); return $root; } public function maxHeapify(int $i): void { $l = $this->leftChild($i); $r = $this->rightChild($i); $largest = $i; if ($l < $this->heapSize && $this->arr[$l]['key'] > $this->arr[$i]['key']) { $largest = $l; } if ($r < $this->heapSize && $this->arr[$r]['key'] > $this->arr[$largest]['key']) { $largest = $r; } if ($largest !== $i) { $this->swap($i, $largest); $this->maxHeapify($largest); } } private function leftChild(int $i): int { return 2 * $i + 1; } private function rightChild(int $i): int { return 2 * $i + 2; } }
As you can see, the implementation is not very “php-ish” and does some pretty standard-ish things with arrays. However, this is based on an implementation in C++. In C++, arrays are very different than in PHP! In PHP, arrays can be sparse, sorted, or unsorted, and more importantly, we can access the underlying iteration mechanics to store state!
Looking over this implementation, you may notice a few other drawbacks. One, this implementation needs to be sorted on every insert, which incurs a
O(log n)
cost, and is also incurred when removing the max item from the Heap. Let’s apply some PHP shenanigans to get it much faster…Utilizing the underlying cursor
The first optimization we can make is to utilize the array’s cursor and not sort on every insert. To do that, we need to keep track of whether or not an item has been inserted or not. So we’ll create a
$dirty
variable, so we only need to sort right before we read from the queue. This will significantly simplify our inserts. Oh, since we don’t have to worry about array sizing, we can also toss out a potential exception:public function insert(int $key, mixed $value): void { $this->arr[$key][] = $value; $this->dirty = true; }
Note that we append to an array with the value. This allows us to have multiple values with the same priority and retrieve them in the same order we insert them (FIFO — first in, first out), just like with a regular queue (our reference implementation does a FILO — first in, last out — implementation which doesn’t seem right to me).
Deletion can now be simplified as well:
public function deleteKey(int $i): void { unset($this->arr[$i]); }
We don’t need to mark the array as dirty because it is already sorted or will be sorted when it is read. For simplicity, we aren’t concerned with multiple values here, and will likely be an application-level decision on how to handle that.
We also don’t need to keep track of the size of the queue anymore; we can just use the built-in function:
public function getSize(): int { return count($this->arr); }
To get the max priority value, we first need to check if the queue is dirty. If it is dirty, we have to sort it. Then return the item:
public function getMax(): mixed { if ($this->dirty) { krsort($this->arr); $this->dirty = false; } $next = current($this->arr); return $next[0]; }
It is very similar to removing the item from the queue as well; however, we have to rebuild the inner queue for a given priority if there is more than one item in the queue:
public function removeMax(): mixed { if ($this->getSize() === 0) { throw new \Exception("Heap is empty"); } if ($this->dirty) { krsort($this->arr); $this->dirty = false; } $current = key($this->arr); $return = current($this->arr); if (count($return) === 1) { unset($this->arr[$current]); return $return[0]; } $data = $return[0]; unset($return[0]); $this->arr[$current] = array_values($return); return $data; }
You may be tempted to use an internal pointer for the little queues, but it will only save you a few microseconds unless you have very dense priorities (lots of values with the same priority).
public function removeMax(): mixed { if ($this->getSize() === 0) { throw new \Exception("Heap is empty"); } if ($this->dirty) { krsort($this->arr); $this->dirty = false; } $current = key($this->arr); $x = &$this->arr[$current]; $value = current($x); $next = next($x); if($next === false) { unset($this->arr[$current]); } return $value; }
Here’s the complete listing:
<?php namespace Withinboredom\Algorithms\Heap; /** * An idiomatic PHP implementation of a heap. * * We are using a cursor here because we are utilizing the magic of sparse arrays! */ class Phpized { private array $arr = []; private bool $dirty = false; public function __construct() { } public function insert(int $key, mixed $value): void { $this->arr[$key][] = $value; $this->dirty = true; } public function deleteKey(int $i): void { unset($this->arr[$i]); } public function removeMax(): mixed { if ($this->getSize() === 0) { throw new \Exception("Heap is empty"); } if ($this->dirty) { krsort($this->arr); $this->dirty = false; } $current = key($this->arr); $x = &$this->arr[$current]; $value = current($x); $next = next($x); if($next === false) { unset($this->arr[$current]); } return $value; } public function getSize(): int { return count($this->arr); } public function getMax(): mixed { if ($this->dirty) { krsort($this->arr); $this->dirty = false; } $next = current($this->arr); return $next[0]; } }
And a simple unit test:
<?php use Withinboredom\Algorithms\Heap\Phpized; it('inserts and removes items in the order they are inserted', function () { $heap = new Phpized(); for ($i = 0; $i < 1000; $i++) { $heap->insert(12, $i); } $heap->insert(15, 10); $heap->insert(0, 12); expect($heap->removeMax())->toBe(10); for ($i = 0; $i < 1000; $i++) { expect($heap->removeMax())->toBe($i); } expect($heap->removeMax())->toBe(12); expect($heap->getSize())->toBe(0); expect($heap->removeMax(...))->toThrow(\Exception::class); });
Comparisons
I used PHP Bench to microbenchmark the two implementations alongside PHP’s SPL implementation. Naturally, the benchmarks are useless in their raw form but valuable for comparison to each other.
+----------------+-----------+----------+-----------+-----------+-----------+ | subject | mode | stdev | best | worst | mem_peak | +----------------+-----------+----------+-----------+-----------+-----------+ | benchReference | 998.044μs | 42.644μs | 950.360μs | 1.275ms | 931.560kb | | benchSpl | 301.795μs | 6.961μs | 294.900μs | 326.080μs | 574.280kb | | benchPhpized | 275.860μs | 7.220μs | 266.890μs | 302.150μs | 574.280kb | +----------------+-----------+----------+-----------+-----------+-----------+
What we can see is that this “phpized” implementation of a priority queue is faster than the C-based implementation or at least just as fast, even utilizing the same amount of memory. These implementations were run with the CLI opcache turned on, JIT enabled, and with dense priorities.
PHP can be faster than you expect
PHP arrays are downright powerful, allowing you to simplify some traditionally complex code, even totally abandoning traditional algorithms altogether in some instances (hint: there’s more to come!).
This is just the beginning of this series, and I hope you’ve found it as exciting and enlightening as I did building it! If you have any questions or suggestions, feel free to ask them in the comments!
-
Um, since when is PHP faster than C++?
Last week I was working on porting an encryption scheme to PHP from C++, mostly to embed it in a website and make changes to it specific to my use case; and for fun.
Needless to say, I was surprised to see my PHP version was faster than the C++ version. TL;DR: it isn’t faster, it was just computing the wrong thing.
The original
Here’s the original C++ code modified to isolate the slower bits:
#include <iostream> #include <cstdlib> #include <gmpxx.h> #include <sys/time.h> #define N 540 int main() { gmp_randclass* random = new gmp_randclass(gmp_randinit_default); random->seed(15); timespec start, finish; int i, j; mpz_class* p = new mpz_class[N]; mpz_class u, pt, x = 1; long nt = (unsigned) 1838 / 460; long np; clock_gettime(CLOCK_MONOTONIC, &start); #pragma omp parallel for private(pt, u, j) for (i = 0; i < N; i++) { p[i] = 1; for(j = 0; j < nt; j++) { if(j < (nt - 1)) np = 460; else np = 1838-460*(nt - 1); u = random->get_z_bits(np); mpz_nextprime(pt.get_mpz_t(), u.get_mpz_t()); #pragma omp critical { p[i] *= pt; } } x *= p[i]; } clock_gettime(CLOCK_MONOTONIC, &finish); std::cout << "Random x: " << mpz_get_str(NULL, 62, x.get_mpz_t()) << std::endl; std::cout << "Completed C++ random generation in " << (double)((finish.tv_nsec-start.tv_nsec)/1000000000.0+(finish.tv_sec-start.tv_sec)) << " seconds" << std::endl; }
We are computing products of primes here, so nothing really fancy is going on. On my machine, it computes the products in almost exactly one second. Here’s the PHP port:
<?php const N = 540; function main(): void { $p = []; $x = gmp_init(1); $nt = abs(round(1838 / 460)); $start = microtime(true); gmp_random_seed(15); for ($i = 0; $i < N; $i++) { $p[$i] = gmp_init(1); for ($j = 0; $j < $nt; $j++) { if ($j < ($nt - 1)) { $np = 460; } else { $np = 1838 - 460 * ($nt - 1); } $u = gmp_random_bits($np); $pt = gmp_nextprime($u); $p[$i] *= $pt; } $x *= $p[$i]; } $finish = microtime(true); $time = $finish - $start; echo "Random x: " . gmp_strval($x, 62) . "\n"; echo "Completed PHP random generation in $time seconds\n"; } main();
On my machine, it computes in 2.7 seconds, almost twice as slow. However, the C++ version is using some parallelization magic that I’m not going to implement in PHP. To do so would require spawning some threads and communicating with them via queues or pipes using System V or pipes, respectively. I don’t really feel like building that or vetting a library that does it for me, so we’ll just remove the parallelization from the C++ bits. It now looks like this:
#include <iostream> #include <cstdlib> #include <gmpxx.h> #include <sys/time.h> #define N 540 int main() { gmp_randclass* random = new gmp_randclass(gmp_randinit_default); random->seed(15); timespec start, finish; int i, j; mpz_class* p = new mpz_class[N]; mpz_class u, pt, x = 1; long nt = (unsigned) 1838 / 460; long np; clock_gettime(CLOCK_MONOTONIC, &start); for (i = 0; i < N; i++) { p[i] = 1; for(j = 0; j < nt; j++) { if(j < (nt - 1)) np = 460; else np = 1838-460*(nt - 1); u = random->get_z_bits(np); mpz_nextprime(pt.get_mpz_t(), u.get_mpz_t()); { p[i] *= pt; } } x *= p[i]; } clock_gettime(CLOCK_MONOTONIC, &finish); std::cout << "Random x: " << mpz_get_str(NULL, 62, x.get_mpz_t()) << std::endl; std::cout << "Completed C++ random generation in " << (double)((finish.tv_nsec-start.tv_nsec)/1000000000.0+(finish.tv_sec-start.tv_sec)) << " seconds" << std::endl; }
It now completes in about 8.2 seconds, compiling with
-O3
. That’s far slower than the PHP version… but why? What is going on?Identifying the differences
In the C++ version, there are several operations that could be slower than in the PHP version:
- multiplication of gmp values
- calculating random numbers
- calculating the next prime number
I calculated the total time spent on each of these operations in C++ vs. PHP:
Operation PHP (secs) C++ (secs) Multiplication 0.058 0.058 Random bits 0.0012 0.0001 Next Prime 2.65 8.24 OK, so now to the gmp extension source code to understand how it is calculating primes so quickly. Maybe there’s something we can learn?
Here’s the function in question: on Github.
/* {{{ Finds next prime of a */ ZEND_FUNCTION(gmp_nextprime) { gmp_unary_op(mpz_nextprime); } /* }}} */
These are using the exact same functions under the hood, however, we’re still manually unwrapping to the c-type in our function (
.get_mpz_t()
), maybe that is the culprit?It turns out that is not the culprit at all, it’s the calculation of the next prime number after all. At this point, the only thing I can think of is the quality of the primes. Perhaps the C++ version is calculating better primes that are bigger? Or something.
Since we are asking for random numbers containing at least X random bits, let’s calculate the average number of bits actually returned.
Language Number bits C++ 611 PHP 458 Aha! The culprit! It is indeedly the quality of the primes. After some further investigation, it looks like it boils down to this calculation:
1838 / 460
Fixing the bug
Well, it turns out there was a subtle bug. The C++ version casts this to unsigned (these are variables in the actual code):
1838 / 460
, and in the PHP version we do:abs(round(1838 / 460))
which is not the same. We want the floor (in PHP, it will magically cast to a float when doing integer division, so we need to do the truncation manually):abs(floor(1838 / 460))
.This results in the PHP version taking 8.4 seconds and generating the appropriate number of random bits, just a tad bit slower than the C++ version and they generate the exact same product as well (since they have the same random seed).
Some key takeaways
I’m only doing this for fun and to satisfy my own curiosity while on my three-month sabbatical. I don’t recommend porting cryptographic functions from C/C++ to PHP unless you really know what the hell you’re doing. PHP has no concept of “unsigned long” and cryptographic algorithms generally rely on them quite heavily. PHP’s maximum integer is
9223372036854775807
and C++’s maximum unsigned long is18446744073709551615
on my machine. This can create serious issues if you need bigger numbers, or serious performance implications if you try to emulate them. Instead, use PHP’s FFI extension to call into the C/C++ code as a library.But when PHP appears to be faster than C++, you have a bug! Finding that bug might not be obvious, it might be subtle, like this one.
-
A More In-Depth Reasoning behind PHP as a good language
Yesterday, I wrote a post over lunch that people just absolutely hated. It was great, in sort of weird way, to see that many people didn’t see what I saw when I posted it. I think it was a somewhat-not-gentle reminder that I see the world differently.
The post was shared on Reddit and I personally shared it on Hacker News (which was flagged) just for giggles to see if there would be any interesting conversations. Indeedly, I got some “interesting” comments:
But wow, this is not a convincing article at all. It’s just an opinion with nothing to back it.
wccrawford on https://news.ycombinator.com/item?id=30700298No substantive argument made here, except “PHP has changed in the past 10 years”.
krageon on https://news.ycombinator.com/item?id=30698957When you’re this wrong about a topic, most people quit even trying to correct you.
spacemanmatt on https://news.ycombinator.com/item?id=30699677There were similar things on Reddit. Truthfully, I didn’t expect people to see what I saw when I posted that post. However, I think it is probably worth digging a little deeper into the post, just to help anyone out who comes along later.
Like, look at this Merge Sort in C# (130 lines) and then look at it in PHP (31 lines). You can’t beat that kind of productivity, not to mention in PHP, you can even mix data types (like ints and floats) where in the C# version, it’d be much more difficult.
https://withinboredom.info/blog/2022/03/16/my-favorite-language-has-changed-to-php/People took this to mean that LOC is what makes them productive(?). I totally get why people would think that because I didn’t spell it out. We learn algorithms like Merge Sort because we should never implement them ourselves. So, why do we learn something we shouldn’t use? Why do we test for it in interviews? The answer is obvious to me, but again, I see the world differently than most.
Think about it, “we should never implement them” which means you need to be able to recognize when you are, or code you’re reviewing is accidentally implementing them, which means you need to know what they look like so you can point it out.
PHP is a ridiculously concise language for being C-like, probably due to the huge amount of built-in functionality, where C# is much more verbose. You can spot the merge sort in PHP at a basic glance, where you have to tease apart the C# version to figure out it is a merge sort. That was my point, not that LOC is important to productivity, but the fact that PHP is so concise, that you can spot common algorithms from a mile away.
However, there are cases for custom implementations, as long as you know what you’re doing. WordPress and Beanstalkd both implement custom priority queues, and they’re faster than they should be because of it. Does that make it harder to maintain and change those pieces? You bet! But sometimes being fast is better than being easy to maintain — such as on critical paths.
So, how does this tie into productivity? If you can easily spot common algorithms/patterns, you have a wealth of information at your fingertips to help you optimize the code, make it easier to understand, and deliver more value to the end-user. If common algorithms are obfuscated, you’ll miss the forest for the trees and not realize the wealth of information you could have had.
When I learned Assembly, I think I broke my brain, permanently. New paradigms were learned, old ways to do things were suddenly way too long. I don’t really know how to explain it, but when I came back to C#, suddenly, I felt too constrained.
https://withinboredom.info/blog/2022/03/16/my-favorite-language-has-changed-to-php/I was optimizing a bluetooth door lock I made myself for my front door. I basically had a very small amount of memory. It was tremendously fun, to figure out how to do everything I wanted to do and squeeze it into a very small amount of flash memory. I think we’d call it spaghetti code if it were written in a higher-level language. 🙂
Writing spaghetti on-purpose is truly harder than writing good code, and optimizing spaghetti is even harder. I highly recommend it, it’s similar to the paradigm shift required to learn languages like Haskell when all you know is procedural/OOP style code.
Anyway, lunchtime is over and I think I covered the big things I wanted to address. Enjoy your day!
-
Yes, PHP is faster than C#
So, I got an interesting spam comment on my post today:
“It gets even crazier when you actually benchmark the two languages only to discover in some real-world cases, PHP outperforms C#.”
I triple dare you to show code examples so we can explain why you’re wrong.
Quadruple dare
Jesus christ, how did you think this was true
By: No McNoingtonThis person couldn’t even be bothered to put in a decent pseudonym to call them by, but Mr./Mrs. McNoington, prepare to be blown away…
See, there’s something very common that all developers must do, and that is read files… we need to parse things, transform file formats, or whatever. So, let’s compare the two languages.
function test() { $file = fopen("/file/file.bin", 'r'); $counter = 0; $timer = microtime(true); while ( ! feof($file)) { $buffer = fgets($file, 4096); $counter += substr_count($buffer, '1'); } $timer = microtime(true) - $timer; fclose($file); printf("counted %s 1s in %s milliseconds\n", number_format($counter), number_format($timer * 1000, 4)); } test();
using System.Diagnostics; using System.Text; var test = () => { using var file = File.OpenText("/file/file.bin"); var counter = 0; var sw = Stopwatch.StartNew(); while(!file.EndOfStream) { if(file.Read() == '1') { counter++; } } sw.Stop(); Console.WriteLine($"Counted {counter:N0} 1s in {sw.Elapsed.TotalMilliseconds:N4} milliseconds"); }; test();
Personally, I feel like this is a pretty fair assessment of each language. We will synchronously read a 4Mib file, byte-by-byte, and count the 1’s in the file. There’s very little user-land code going on here, so we’re just trying to test the very fundamentals of a language: reading a file. We’re only adding the counting here to prevent clever optimizing compilers (opcache in PHP, release mode in C#) from cheating and removing the code.
“But Rob,” I hear you say, “they’re not reading it byte-by-byte in the PHP version!” and I’d reply with, “but we’re not reading it byte-by-byte in the C# version either!”
Let’s see how it goes:
PHP: 32.49 ms (avg over 10 runs) C#: 37.30 ms (avg over 10 runs) That’s pretty crazy… I mean, we just read four megs, which is about the size of a decent photo. What about something like a video clip that might be 2.5 gigs?
PHP: 24.82 s (avg over 10 runs) C#: 26.67 s (avg over 10 runs) Now, I need to process quite a bit of incoming files from banks and bills and stuff for my household budgeting system, which is how I discovered this earlier last year as I was porting things over from a hodgepodge of random stuff to Dapr and Kubernetes. PHP is actually faster than C# at reading files, who knew?!
Does this mean you should drop everything and just rewrite all your file writing stuff in PHP (or better, C)? no. Not at all. A few milliseconds isn’t going to destroy your day, but if your bottleneck is i/o, maybe it’s worth considering :trollface:?
Nah, don’t kid yourself. But if you’re already a PHP dev, now you know that PHP is faster than C#, at least when it comes to reading files…
Feel free to peruse some experiments here (or if you want to inspect the configuration): withinboredom/racer: racing languages (github.com)
Can this C# be written to be faster, sure! Do libraries implement “the faster way?” Not usually.
Addendum
Many people have pointed out that the C# version isn’t reading it in binary mode and the function call overhead are to blame. Really? C# is many order of magnitudes faster than PHP at function calls. I promise you that isn’t the problem. Here’s the code for binary mode on the 2.5gb file:
using System.Diagnostics; using System.Text; var binTest = () => { using var file = File.OpenRead("/file/file.bin"); var counter = 0; var buffer = new byte[4096]; var numRead = 0; var sw = Stopwatch.StartNew(); while ((numRead = file.Read(buffer, 0, buffer.Length)) != 0) { counter += buffer.Take(numRead).Count((x) => x == '1'); } sw.Stop(); Console.WriteLine($"Counted {counter:N} 1s in {sw.Elapsed.TotalMilliseconds} milliseconds"); }; binTest();
If you now want to complain that it’s all Linq’s fault, we can just remove the
.Take
and double count things because I need to get to work and I’m not putting any more time to telling people the sky is blue.with .Take
: 38.40s (2.5gb file)without .Take
: 23.5s (2.5gb file — but incorrect implementation)So yeah, if an incorrect implementation is the proof you need that PHP is slower, go for it. Time to go to work.
Addendum 2
Since people come here wanting to optimize the C# without optimizing the PHP version, here is an implementation ONLY looking at file performance:
function test() { $file = fopen("/file/file.bin", 'r'); $counter = 0; $timer = microtime(true); while (stream_get_line($file, 4096) !== false) { ++$counter; } $timer = microtime(true) - $timer; fclose($file); printf("counted %s 1s in %s milliseconds\n", number_format($counter), number_format($timer * 1000, 4)); } test();
var binTest = () => { using var file = File.OpenRead("/file/file.bin"); var counter = 0; var buffer = new byte[4096]; var sw = Stopwatch.StartNew(); while (file.Read(buffer, 0, buffer.Length) != 0) { counter += 1; } sw.Stop(); Console.WriteLine($"Counted {counter:N} 1s in {sw.Elapsed.TotalMilliseconds} milliseconds"); }; binTest();
And here are the results:
PHP: 423.50 ms (avg over 10 runs) C#: 530.42 ms (avg over 10 runs) -
My Favorite Language has Changed to PHP
For the last 15 years or so, my favorite language has been C#, despite using mostly PHP in my workplace since 2012. Over the years, I’ve written a lot of side-projects in C#, and it’s been great, for the most part. Until I learned Assembly… last year.
When I learned Assembly, I think I broke my brain, permanently. New paradigms were learned, old ways to do things were suddenly way too long. I don’t really know how to explain it, but when I came back to C#, suddenly, I felt too constrained.
Like, look at this Merge Sort in C# (130 lines) and then look at it in PHP (31 lines). You can’t beat that kind of productivity, not to mention in PHP, you can even mix data types (like ints and floats) where in the C# version, it’d be much more difficult.
It gets even crazier when you actually benchmark the two languages only to discover in some real-world cases, PHP outperforms C#. Yeah, I know outlandish claims requires substantial proof, but I’m not here to prove to it you. I’m just talking about my reasons for PHP becoming my favorite language (and yes, writing multithreaded PHP is still a PITA).
There are some things I don’t like about PHP, like, the fact that you need two separate processes to serve web pages at scale (nginx + php-fpm). Some of that is solved with Swoole/ReactPHP/AmpPHP and friends, but then you end up needing to rewrite things to work, using special database drivers and things. I’m hoping with PHP Fibers, that’ll be a thing of the past, eventually.
Although, if you’re running something like the nginx ingress in Kubernetes, you can just use the Fastcgi target directly, without an intermediary. That’s actually pretty nice. In this environment, I think it’s actually better than running a native web-server on the pod. There’s no need to parse a forwarded HTTP request (or do http2 to http1 translations), since you’re actually operating on the original HTTP request. Maybe it would be nice if C# provided a Fastcgi binding…
Anyway, I just wanted to announce this to the world because PHP has a bad rap pre-5.3. Since PHP 5.6, things have changed quite a bit. If you haven’t used PHP in the last 7-8 years, you should give it a go.