So, I got an interesting spam comment on my post today:
“It gets even crazier when you actually benchmark the two languages only to discover in some real-world cases, PHP outperforms C#.”
I triple dare you to show code examples so we can explain why you’re wrong.
Quadruple dare
Jesus christ, how did you think this was true
By: No McNoington
This person couldn’t even be bothered to put in a decent pseudonym to call them by, but Mr./Mrs. McNoington, prepare to be blown away…
See, there’s something very common that all developers must do, and that is read files… we need to parse things, transform file formats, or whatever. So, let’s compare the two languages.
function test()
{
$file = fopen("/file/file.bin", 'r');
$counter = 0;
$timer = microtime(true);
while ( ! feof($file)) {
$buffer = fgets($file, 4096);
$counter += substr_count($buffer, '1');
}
$timer = microtime(true) - $timer;
fclose($file);
printf("counted %s 1s in %s milliseconds\n", number_format($counter), number_format($timer * 1000, 4));
}
test();
using System.Diagnostics;
using System.Text;
var test = () => {
using var file = File.OpenText("/file/file.bin");
var counter = 0;
var sw = Stopwatch.StartNew();
while(!file.EndOfStream)
{
if(file.Read() == '1')
{
counter++;
}
}
sw.Stop();
Console.WriteLine($"Counted {counter:N0} 1s in {sw.Elapsed.TotalMilliseconds:N4} milliseconds");
};
test();
Personally, I feel like this is a pretty fair assessment of each language. We will synchronously read a 4Mib file, byte-by-byte, and count the 1’s in the file. There’s very little user-land code going on here, so we’re just trying to test the very fundamentals of a language: reading a file. We’re only adding the counting here to prevent clever optimizing compilers (opcache in PHP, release mode in C#) from cheating and removing the code.
“But Rob,” I hear you say, “they’re not reading it byte-by-byte in the PHP version!” and I’d reply with, “but we’re not reading it byte-by-byte in the C# version either!”
Let’s see how it goes:
PHP: 32.49 ms (avg over 10 runs) |
C#: 37.30 ms (avg over 10 runs) |
That’s pretty crazy… I mean, we just read four megs, which is about the size of a decent photo. What about something like a video clip that might be 2.5 gigs?
PHP: 24.82 s (avg over 10 runs) |
C#: 26.67 s (avg over 10 runs) |
Now, I need to process quite a bit of incoming files from banks and bills and stuff for my household budgeting system, which is how I discovered this earlier last year as I was porting things over from a hodgepodge of random stuff to Dapr and Kubernetes. PHP is actually faster than C# at reading files, who knew?!
Does this mean you should drop everything and just rewrite all your file writing stuff in PHP (or better, C)? no. Not at all. A few milliseconds isn’t going to destroy your day, but if your bottleneck is i/o, maybe it’s worth considering :trollface:?
Nah, don’t kid yourself. But if you’re already a PHP dev, now you know that PHP is faster than C#, at least when it comes to reading files…
Feel free to peruse some experiments here (or if you want to inspect the configuration): withinboredom/racer: racing languages (github.com)
Can this C# be written to be faster, sure! Do libraries implement “the faster way?” Not usually.
Addendum
Many people have pointed out that the C# version isn’t reading it in binary mode and the function call overhead are to blame. Really? C# is many order of magnitudes faster than PHP at function calls. I promise you that isn’t the problem. Here’s the code for binary mode on the 2.5gb file:
using System.Diagnostics;
using System.Text;
var binTest = () =>
{
using var file = File.OpenRead("/file/file.bin");
var counter = 0;
var buffer = new byte[4096];
var numRead = 0;
var sw = Stopwatch.StartNew();
while ((numRead = file.Read(buffer, 0, buffer.Length)) != 0)
{
counter += buffer.Take(numRead).Count((x) => x == '1');
}
sw.Stop();
Console.WriteLine($"Counted {counter:N} 1s in {sw.Elapsed.TotalMilliseconds} milliseconds");
};
binTest();
If you now want to complain that it’s all Linq’s fault, we can just remove the .Take
and double count things because I need to get to work and I’m not putting any more time to telling people the sky is blue.
with .Take : 38.40s (2.5gb file) |
without .Take : 23.5s (2.5gb file — but incorrect implementation) |
So yeah, if an incorrect implementation is the proof you need that PHP is slower, go for it. Time to go to work.
Addendum 2
Since people come here wanting to optimize the C# without optimizing the PHP version, here is an implementation ONLY looking at file performance:
function test()
{
$file = fopen("/file/file.bin", 'r');
$counter = 0;
$timer = microtime(true);
while (stream_get_line($file, 4096) !== false) {
++$counter;
}
$timer = microtime(true) - $timer;
fclose($file);
printf("counted %s 1s in %s milliseconds\n", number_format($counter), number_format($timer * 1000, 4));
}
test();
var binTest = () =>
{
using var file = File.OpenRead("/file/file.bin");
var counter = 0;
var buffer = new byte[4096];
var sw = Stopwatch.StartNew();
while (file.Read(buffer, 0, buffer.Length) != 0)
{
counter += 1;
}
sw.Stop();
Console.WriteLine($"Counted {counter:N} 1s in {sw.Elapsed.TotalMilliseconds} milliseconds");
};
binTest();
And here are the results:
PHP: 423.50 ms (avg over 10 runs) |
C#: 530.42 ms (avg over 10 runs) |
Comments
34 responses to “Yes, PHP is faster than C#”
… You’re buffering 4k in the php version and reading byte-by-byte in C#, of course it’s faster.
it’s probably with the file stat cache system.
C# probably check if the file exist first, PHP too, but only one time
if you put a clearstatcache() before the fopen, that more revelant
this is very naive reimplentation of the C# version. I managed to reduce the runtime of the same file from 5.7 seconds to just 800ms
using var file = File.OpenRead(“file.bin”);
var counter = 0;
var sw = Stopwatch.StartNew();
var buf = new byte[4096];
while (file.Read(buf,0,buf.Length) > 0)
{
foreach (var t in buf)
{
if (t == ‘1’)
{
counter++;
}
}
}
sw.Stop();
Console.WriteLine($"Counted {counter:N0} 1s in {sw.Elapsed.TotalMilliseconds:N4} milliseconds");
Nice! I’ll have to give that a go.
Well PHP is not faster than C# anymore 😀
I’ve added an addendum, the implementation given double counts 1’s and it isn’t correct.
I’ve updated the post on why this is faster, you’re actually double-counting 1’s in this implementation.
Why do you use linq?
for(int i = 0; i < numRead; i++)
{
if(buffer[i] == ‘1’)
{
counter++;
}
}
This code is nearly 4 times faster.
Another mistake is that you are turning off release mode. Release mode is not equivalent to opcache. If you want to compare opcache to something from .NET, it would have to be NativeAOT. So this comparison does not make sense.
I’m using Linq because that’s what libraries and end-user code will likely be using. Most people don’t write libraries to be in critical paths of C#.
You use linq where it makes sense, generally where accuracy/correctness is important and performance is a tertiary concern. Dotnet code that deals with reading text streams are generally high optimization scenarios, where linq is scrubbed in favor of performance to eliminate the enumerable overhead. You won’t find code like that in serializers, for example. The only other example I can think of would be reading a file on intialization, which wouldn’t use this approach and would run once or rarely.
While I fully believe you’ve seen this impl somewhere, it’s not any standard professional library approach I’ve seen over the past 20+ years in any library remotely concerned with performance.
So, he is using code built for debug mode and even with this handicap c# is still faster?! 🤣
Epic!
I don’t know where you commenters are seeing that it isn’t compiled with optimizations, because both are using all optimizations available to it.
It seems PHP does not handle unicode in string (per documentation), but only one byte char. So it’s correct to use byte[] vs string.
The error on the previous suggested comment it that numRead is never used.
Here is a correct implementation:
using var file = File.OpenRead("/file/file.bin");
var counter = 0;
var sw = Stopwatch.StartNew();
var buffer = new byte[4096];
var numRead = 0;
while ((numRead = file.Read(buffer, 0, buffer.Length)) != 0)
{
for (int i = 0; i < numRead; i++)
{
if (buffer[i] == '1')
counter++;
}
}
sw.Stop();
Console.WriteLine($"Counted {counter} 1s in {sw.Elapsed.TotalMilliseconds} milliseconds");
With this base, for a file of 3GB (https://ftp.crifo.org/ubuntu-cd/20.04.4/ubuntu-20.04.4-desktop-amd64.iso), I get ~3.9 sec for php and 3.0 sec for .net for 13065867 occurences of ‘1’ in both cases.
But I am also convinced this type of comparaison does not have a lot of sense…
By him fixing the small bug in his code, his code will run even faster. It is only counting extra ones if the read() call returns less data than the available buffer. All he needs to do is use the return value of the Read call to determine how much data was moved into his buffer:
while((readLen = file.Read(buffer, 0, buffer.Length)) > 0)
{
for (var idx = 0; idx < readLen; idx++)
if (buffer[idx] == ‘1’)
counter++;
}
What is happening in his implementation is the last call to Read returns less data than the buffer (4k), but the loop still iterates over the full 4k buffer, and any data beyond where the most recent Read() modified can still contain ‘1’ characters in it from previous read calls.
… This entire post is a terrible comparison by you. It is not a true comparison of much as your two implementations are different, which is going to yield different results, which are not based on the language or runtime.
Your PHP code buffers the input. The C# code doesn’t. The algorithms used to read the files are not the same, so the speed comparison is invalid. As other posters have shown when they buffer the input file the C# version is significantly faster.
Did you read the next sentence?
C# can go as low level as needed if speed is an issue. Using unsafe blocks with raw memory pointers.
.NET core now has libraries that target processor instruction sets. Look into intrinsics.
There also newer abstractions for reading arrays and memory blocks like the Span<> object.
So for me this was a naive comparison.
Now, since you were proven wrong, you should also rename the post as “Yes, PHP is faster than C# in bad implementations” or “No, PHP is not faster than C# in correct implementations”.
Haha, I don’t think I’ve been “proven wrong” yet. As I said, I saw this in real-world code and I’m just trying to demonstrate it with as little code as possible. There are a few other (important) places where PHP is faster than C# as well, but this is just the simplest one to demonstrate.
I strongly disagree on comparing performance of programs without optimizations, optimizing will undoubtedly change the results and deployed C# programs are supposed to be compiled in release mode so I don’t think your claim is valid
I also have no idea of what kind of optimizations the php interpreter supports.
It seems to me that you didn’t enable them because otherwise there wouldn’t be this post but I haven’t measured anything and I can’t prove it
all optimizations were on in PHP and C#, with the exception of AOT compiling in C# (there’s a JIT cost in both languages, but it isn’t measured here).
Huh. Removing ‘.Take()’ and ‘.Count()’ and ‘=>’ is an incorrect implementation?
They double and triple our program’s execution time, bro!
I would rather use old school for and foreach.
Agree, LINQ is slow. I use it when speed isn’t an issue, but if my code is processing lots of data and slow, one of the first things I’m going to do is remove LINQ and see if that speeds it up.
Did a short write up to response to this: https://dev.to/goaty92/in-response-to-yes-php-is-faster-than-c-2a2g
Some main point:
– You said that the test has “very little user-land code” and mainly test file-reading. I found this to be incorrect.
– I believe substr_count has SIMD optimization. Comparing it with a LINQ makes no sense. A comparable SIMD based method in C# makes it 10x faster than the PHP variant.
Also I believe saying using LINQ makes sense because people usually use it is quite misleading. LINQ is for managing object collections which are typically in the range of hundreds. It is not meant for byte-buffer processing. You are using the wrong tool for the job.
I dunno. I just don’t consider synchronous unbuffered byte-by-byte file read a “real world scenario” – at least not in 99.99% cases. But that’s just me I guess.
Generally speaking, all file reading will execute in about the same time in all languages, be it PHP, C#, Go, Node, … so there is absolutely no point in comparing that, unless it’s a pretty new exotic language that is not yet optimized. And in your last comparison, see the paragraph below because it still applies.
I must mention that I’m no C# expert nor PHP expert, but I believe that
substr_count
is a native C function, and there is no point in discussing how highly optimized it is already, and it’s a simple function. Compared to your C# implementation: it’s pure C# code, no C/C++ involved so it really does exercise the performance of the .Net runtime. So as you can see, you’re not comparing right because you’re comparing C vs. C#, obviously C would be faster. However, it’s impressive that C# manages to be very close behind.Next, you’re making a call to
File.OpenText
which treats the underlying file as Text, and not just text, but also UTF-8. No need to mention the overhead necessary for UTF-8 processing every-time you call.Read()
, which not only deals with single character (and I say character, not bytes), but needs to be unicode aware to be so, and thus, multi-byte aware; I might be wrong here, but it’s correct that it’s unicode-aware, I don’t know if there is a conditional in the implementation to act dumb for just 1-byte char comparison. On the other hand, your PHP doesn’t even care about all that. So clearly an unfair comparison.Last, you’re processing 4KB on PHP, and you’re processing just 1 char on C# on every cycle, again, not a proper comparison. Consequently, you’re calling the PHP function just 1024 times, but 4 millions times for C#, obviously that would take a toll even if C# function calls are impressively fast. So if you want to make it fare, try consuming 1 byte/char every time on the PHP and you’ll see how fast PHP is against C#. Thomas B. has given a fair implementation for C# where it uses also a 4KB buffer, however, it is still suffering from the same story about comparing pure C# vs plain C.
Don’t forget that your PHP implementation is basically relying more than 99% of the time on C native implementations. So, the way I see it is that it’s more like a C vs C# comparison, we all know who wins performance-wise.
If you want to compare 2 languages, have them do things in their purest form, involving native C functions and pure implementations in a comparison is just not right. Like I said, I have had experience in both languages but I’m no expert on either. Go is my cup of tea. So, for my contribution, here is a Go implementation:
=======> Go: 1.54 ms (avg over 10 runs) … yes, that’s one point five milliseconds
Note: this is 100% pure Go, no native C or native whatever involved. This is the first implementation, and it does more things in each cycle than your PHP/C# example, and it’s probably a very naive implementation from my part.
Ha, yes, Go is quite impressive!
I highly suggest actually reading the article though 🙂
I read your article alright, including the original post. But as you can notice, my post was not really around the file read but the actual C# code being compared to a PHP: C# was almost fully user-land whereas the PHP was barely. I agree that for some reasons, plain file read is indeed apparently faster on PHP, but like I said, I’m no C# expert and I believe it falls-down to the choice of API that one uses in this one single case of file read comparison. That could also depend on the runtime, Mono and dotnet would probably incur differences, I didn’t seem to have found any mention of the runtime infos.
For the choice of language, I’m not a fan of PHP, obviously (14 years ago, I was a fan of PHP). I don’t see any real benefits moving from C# to PHP, C# has a much wider application opportunities and ecosystem, and it has a massive experience re-use. That can’t be said with PHP.
Anyway, you said in your original post: “If you haven’t used PHP in the last 7-8 years, you should give it a go”. Well, I say, give Go a go 🙂
By the way, the Google Sign-in for your comments system doesn’t seem to work. So I opted for the Facebook ID.
Go is very much on my list of languages to learn well. There are quite a few amazing libraries that I’ve wanted to use directly instead of a shim around a CLI for some personal projects. I’d love for an excuse to use it at work (Python/Scala/PHP are the languages I use at work), and we do have some tooling written in Go that could probably use a fresh set of eyes… anyway, you make a really good point about PHP being built on C vs. pure C#. I wonder if that is why it is also slower with some types of collections, and database accessing? I really appreciate your comments btw!
Good to know and thanks for reporting it! A lot of this blog is almost 10 years old at this point, it might be time to reinstall everything from scratch!
Well, I feel that part of the reason you choose a language is productivity and performance, probably among other thins. As it turns-out, that’s also how I think, I also greatly value the maintainability. I could go work on a C# project for months, and pick-up a Go project and just feel right at home in less than an hour, and that’s very difficult in C#.
There has been a time where I was having a good long and hard thoughts of comparisons between C# and Go. In the end, Go has won my heart, mainly because of its simplicity mixed with decent performance and productivity. There’s also the power to go down deep to assembly (right within Go, the integration is just so straightforward) for extreme performance, or fly as high to your heart’s content while still being surprisingly fast even with the first attempt (like the above codes).
At first glance, Go seems kinda of verbose, similar to C/C++/C#, however, once you get used to it, it just feels natural and it just flows like how you speak words, may be because of its simplicity.
Lack of generics have been the negative point for Go (but I used Go for 9 years or so without finding it a real bottleneck), but this month, the v1.18 will come out and will have generics baked-in (been cooking for a very long time). So, the coming month of April, I believe a lot of blog posts will talk about this very welcomed generics feature, and you could jump on then.
I can’t really tell you how good programming in Go is, but for the values you’re seeking for, I believe you’ll be greatly satisfied with Go, like I was, especially now with generics. Docker, Kubernetes, Podman, etc… are all made possible thanks to Go. I believe there are some blogs about the decision to use Go for these projects.
Anyway, it would be interesting to see comparisons of how fast other programming languages are when doing real world tasks like what you had in this blog, including the counting of ‘1’ not just file read, given that you already do Scala and Python in addition to PHP.
After reading the entire article a second time after reading the comments, your basic defense to the erroneous comparison (apples to oranges) is to read the next part of the article. Your comparison, as pointed out by myself and others, some in great detail, is flawed.
Yes, if people basically want to point out buffering as being an issue, despite me saying that yes, the C# version is doing the same thing, in the article itself. I will tell people to read the article. If you want to erroneously point out that optimizations are turned off, without reading the sentence where they say they are turned on or checking the Dockerfiles to see what is actually being done, I will tell you to read the article.
If you point out how the code can be improved, the code will be improved. As I’ve mentioned several times though, I saw this issue (among with others) in real life code when porting microservices to C# from PHP. C# is slower than PHP in some real world cases, which is surprising (even to me)! Can we make C# actually faster than PHP? Maybe, maybe not, but it doesn’t help when the code lives deep in libraries in the real world.
So, then, what you’re saying is that if you’re a terrible programmer, you can be really good at writing shitty code. Is that it?
What .NET version did you use? .NETFM 4.8, .NET 6?
Also what OS is this? You’re testing the performance of the File IO APIs not the lang itself.
On Linux PHP file IO may be better than .NET on Linux.