Um, since when is PHP faster than C++?

Last week I was working on porting an encryption scheme to PHP from C++, mostly to embed it in a website and make changes to it specific to my use case; and for fun.

Needless to say, I was surprised to see my PHP version was faster than the C++ version. TL;DR: it isn’t faster, it was just computing the wrong thing.

The original

Here’s the original C++ code modified to isolate the slower bits:

#include <iostream>
#include <cstdlib>
#include <gmpxx.h>
#include <sys/time.h>

#define N  540

int main() {
    gmp_randclass* random = new gmp_randclass(gmp_randinit_default);
    random->seed(15);
    timespec start, finish;
    int i, j;
    mpz_class* p = new mpz_class[N];
    mpz_class u, pt, x = 1;
    long nt = (unsigned) 1838 / 460;
    long np;

    clock_gettime(CLOCK_MONOTONIC, &start);

    #pragma omp parallel for private(pt, u, j)
    for (i = 0; i < N; i++) {
        p[i] = 1;
        for(j = 0; j < nt; j++) {
            if(j < (nt - 1)) np = 460;
            else np = 1838-460*(nt - 1);

            u = random->get_z_bits(np);
            mpz_nextprime(pt.get_mpz_t(), u.get_mpz_t());

            #pragma omp critical
            {
                p[i] *= pt;
            }
        }

        x *= p[i];
    }

    clock_gettime(CLOCK_MONOTONIC, &finish);

    std::cout << "Random x: " << mpz_get_str(NULL, 62, x.get_mpz_t()) << std::endl;
    std::cout << "Completed C++ random generation in " << (double)((finish.tv_nsec-start.tv_nsec)/1000000000.0+(finish.tv_sec-start.tv_sec)) << " seconds" << std::endl;
}

We are computing products of primes here, so nothing really fancy is going on. On my machine, it computes the products in almost exactly one second. Here’s the PHP port:

<?php

const N = 540;

function main(): void
{
    $p = [];
    $x = gmp_init(1);
    $nt = abs(round(1838 / 460));
    $start = microtime(true);
    gmp_random_seed(15);

    for ($i = 0; $i < N; $i++) {
        $p[$i] = gmp_init(1);
        for ($j = 0; $j < $nt; $j++) {
            if ($j < ($nt - 1)) {
                $np = 460;
            } else {
                $np = 1838 - 460 * ($nt - 1);
            }

            $u = gmp_random_bits($np);
            $pt = gmp_nextprime($u);
            $p[$i] *= $pt;
        }

        $x *= $p[$i];
    }
    $finish = microtime(true);

    $time = $finish - $start;
    echo "Random x: " . gmp_strval($x, 62) . "\n";
    echo "Completed PHP random generation in $time seconds\n";
}

main();

On my machine, it computes in 2.7 seconds, almost twice as slow. However, the C++ version is using some parallelization magic that I’m not going to implement in PHP. To do so would require spawning some threads and communicating with them via queues or pipes using System V or pipes, respectively. I don’t really feel like building that or vetting a library that does it for me, so we’ll just remove the parallelization from the C++ bits. It now looks like this:

#include <iostream>
#include <cstdlib>
#include <gmpxx.h>
#include <sys/time.h>

#define N  540

int main() {
    gmp_randclass* random = new gmp_randclass(gmp_randinit_default);
    random->seed(15);
    timespec start, finish;
    int i, j;
    mpz_class* p = new mpz_class[N];
    mpz_class u, pt, x = 1;
    long nt = (unsigned) 1838 / 460;
    long np;

    clock_gettime(CLOCK_MONOTONIC, &start);

    for (i = 0; i < N; i++) {
        p[i] = 1;
        for(j = 0; j < nt; j++) {
            if(j < (nt - 1)) np = 460;
            else np = 1838-460*(nt - 1);

            u = random->get_z_bits(np);
            mpz_nextprime(pt.get_mpz_t(), u.get_mpz_t());

            {
                p[i] *= pt;
            }
        }

        x *= p[i];
    }

    clock_gettime(CLOCK_MONOTONIC, &finish);

    std::cout << "Random x: " << mpz_get_str(NULL, 62, x.get_mpz_t()) << std::endl;
    std::cout << "Completed C++ random generation in " << (double)((finish.tv_nsec-start.tv_nsec)/1000000000.0+(finish.tv_sec-start.tv_sec)) << " seconds" << std::endl;
}

It now completes in about 8.2 seconds, compiling with -O3. That’s far slower than the PHP version… but why? What is going on?

Identifying the differences

In the C++ version, there are several operations that could be slower than in the PHP version:

  1. multiplication of gmp values
  2. calculating random numbers
  3. calculating the next prime number

I calculated the total time spent on each of these operations in C++ vs. PHP:

OperationPHP (secs)C++ (secs)
Multiplication0.0580.058
Random bits0.00120.0001
Next Prime2.658.24

OK, so now to the gmp extension source code to understand how it is calculating primes so quickly. Maybe there’s something we can learn?

Here’s the function in question: on Github.

/* {{{ Finds next prime of a */
ZEND_FUNCTION(gmp_nextprime)
{
	gmp_unary_op(mpz_nextprime);
}
/* }}} */

These are using the exact same functions under the hood, however, we’re still manually unwrapping to the c-type in our function (.get_mpz_t()), maybe that is the culprit?

It turns out that is not the culprit at all, it’s the calculation of the next prime number after all. At this point, the only thing I can think of is the quality of the primes. Perhaps the C++ version is calculating better primes that are bigger? Or something.

Since we are asking for random numbers containing at least X random bits, let’s calculate the average number of bits actually returned.

LanguageNumber bits
C++611
PHP458

Aha! The culprit! It is indeedly the quality of the primes. After some further investigation, it looks like it boils down to this calculation: 1838 / 460

Fixing the bug

Well, it turns out there was a subtle bug. The C++ version casts this to unsigned (these are variables in the actual code): 1838 / 460, and in the PHP version we do: abs(round(1838 / 460)) which is not the same. We want the floor (in PHP, it will magically cast to a float when doing integer division, so we need to do the truncation manually): abs(floor(1838 / 460)).

This results in the PHP version taking 8.4 seconds and generating the appropriate number of random bits, just a tad bit slower than the C++ version and they generate the exact same product as well (since they have the same random seed).

Some key takeaways

I’m only doing this for fun and to satisfy my own curiosity while on my three-month sabbatical. I don’t recommend porting cryptographic functions from C/C++ to PHP unless you really know what the hell you’re doing. PHP has no concept of “unsigned long” and cryptographic algorithms generally rely on them quite heavily. PHP’s maximum integer is 9223372036854775807 and C++’s maximum unsigned long is 18446744073709551615 on my machine. This can create serious issues if you need bigger numbers, or serious performance implications if you try to emulate them. Instead, use PHP’s FFI extension to call into the C/C++ code as a library.

But when PHP appears to be faster than C++, you have a bug! Finding that bug might not be obvious, it might be subtle, like this one.

Leave a Reply