This adventure starts with
git-lfs. It was a normal day and I added a 500 MB binary asset to my server templates. When I went to push it, I found it interesting that
git-lfs was uploading at 50KB per second. Being that I had a bit of free time that I’d much rather be spending on something else than waiting FOREVER to upload a file, I decided to head upstairs and plug into the ethernet. I watched it instantly jump up to 2.5 MB per second. Still not very fast, but I was now intensely curious.
Since I figured I would have originally been waiting FOREVER for this to upload, I decided to use that time and investigate what was going on. While I would expect wired ethernet to be a bit faster than wifi, I didn’t expect it to be orders (with an
s) of magnitude faster. Just to check my sanity, I ran a speed test and saw my upload speed on wifi at 40MB per second, and wired at 60MB per second.
After some investigations with WireShark and other tools, I learned that my wifi channels have a shitload of interference in the 2Ghz band, and just a little in the 5Ghz band. During this time, I also learned that my router wouldn’t accept a single 5Ghz client due to a misconfiguration on my part. So, non-sequitur, apparently enabling “Target Wake Time” was very important (I have no idea what that does). Once that was fixed, I saw 600MB per second on my internal network and outside throughput was about the same as wired.
But, why on earth was
git-lfs so slow, even on 5Ghz? After looking at Wireshark while uploading to
git-lfs, I noticed about 30-50% of the traffic was out-of-order/duplicate ACKs, causing retransmissions. I found that especially weird, not terribly weird because remember, this wifi network “sucks” with all my noisy neighbors. It turns out there are random 50-100ms delays all over the place. Probably due to interference. When I ran a speed test or browser session, however, it was less than 1%! In fact,
git-lfs was barely sending any packets at all, like it was eternally stuck in TCP slow-start.
When I looked at the packets, they were being sent in ~50-byte payload chunks (~100 bytes total, MTU is 1500). I found that very interesting because I would expect Nagle’s algorithm to coalesce packets so there would be fewer physical packets to send. That is when it hit me,
TCP_NODELAY must be set.
Between that, and extremely regular 100ms delays, it could only get off a few packets before getting a “lost packet,” not to mention nearly 50% of every packet was literally packet headers. I was literally, permanently stuck in TCP Slow Start.
TCP No Delay from Memory
Nagle’s Algorithm was written approximately 4 decades ago to solve the “tinygram” problem, where you are sending a whole bunch of little packets, flooding the network, and reducing network throughput. Nagle’s algorithm essentially bundles all the little packets into one big packet, waiting for an ACK or a full packet to be constructed, whichever is sooner.
It’s a bit more complex than that due to decades of changes to make the web better and more performant… but turning on
TCP_NODELAY would mean that each of those 50 bytes are sent out as one packet instead of just a few bigger packets. This increases the network load, and when there’s a probability that a packet will need to be retransmitted, you’ll see a lot more retransmissions.
If you want to know more, use Google.
Diving in the code
From there, I went into the
git-lfs codebase. I didn’t see any calls to
setNoDelay and when I looked it up, it said it was the default. Sure enough:
Indeedly, the socket disables Nagle’s algorithm by default in Go.
Is this a trick?
I think this is a pretty nasty trick. The “default” in most languages I’ve used has
TCP_NODELAY turned off. Turning it on has some serious consequences (most of them bad).
- Can easily saturate a network with packet overhead to send a single byte.
- Can send a whole bunch of small packets with high overhead (eg, half the data being sent is packet headers for
- Reduces latency (the only pro) by sending small packets
- Can cause havoc on an unreliable link
I wasn’t able to dig out why Go chose to disable Nagle’s algorithm, though I assume a decision was made at some point and discussed. But this is tricky because it is literally the exact opposite of what you’d expect coming from any other language.
Further, this “trick” has probably wasted hundreds of thousands of hours while transferring data over unreliable links (such as getting stuck in TCP slow start, saturating devices with “tinygram” packets, etc). As a developer, you expect the language to do “the best thing” it is able to do. In other words, I expect the network to be efficient. Literally decades of research, trial, and error have gone into making the network efficient.
I would absolutely love to discover the original code review for this and why this was chosen as a default. If the PRs from 2011 are any indication, it was probably to get unit tests to pass faster. If you know why this is the default, I’d love to hear about it!
That code was in turn a loose port of the dial function from Plan 9 from User Space, where I added TCP_NODELAY to new connections by default in 2004 , with the unhelpful commit message “various tweaks”. If I had known this code would eventually be of interest to so many people maybe I would have written a better commit message!
I do remember why, though. At the time, I was working on a variety of RPC-based systems that ran over TCP, and I couldn’t understand why they were so incredibly slow. The answer turned out to be TCP_NODELAY not being set. As John Nagle points out , the issue is really a bad interaction between delayed acks and Nagle’s algorithm, but the only option on the FreeBSD system I was using was TCP_NODELAY, so that was the answer. In another system I built around that time I ran an RPC protocol over ssh, and I had to patch ssh to set TCP_NODELAY, because at the time ssh only set it for sessions with ptys . TCP_NODELAY being off is a terrible default for trying to do anything with more than one round trip.
When I wrote the Go implementation of net.Dial, which I expected to be used for RPC-based systems, it seemed like a no-brainer to set TCP_NODELAY by default. I have a vague memory of discussing it with Dave Presotto (our local networking expert, my officemate at the time, and the listed reviewer of that commit) which is why we ended up with SetNoDelay as an override from the very beginning. If it had been up to me, I probably would have left SetNoDelay out entirely.rsc on Hacker News
This ‘default’ has some pretty significant knock-on-effects throughout the Go ecosystem. I was seeing terrible performance of Caddy, for example, on my network. It was fairly frustrating that I couldn’t identify the issue. But after some testing, now I know (I opened an issue).
Much (all?) of Kubernetes is written Go, and how has this default affected that? In this case, this ‘default’ is probably desired. Probably. The network is (usually) reliable and with 10G+ links between them, so they can handle sending small-byte packets with 40 byte headers. Probably.
This obviously affects
git-lfs, much to my annoyance. I hope they fix it… I opened an issue.
When to use this?
In most cases,
TCP_NODELAY shouldn’t be enabled. Especially if you don’t know how reliable the network is and you aren’t managing your own buffers. If you’re literally streaming data a chunk at a time, at least fill a packet before sending it! Otherwise, turn off
TCP_NODELAY and stream your little chunks to the socket and let Nagle’s Algorithm handle it for you.
Most people turn to
TCP_NODELAY because of the “200ms” latency you might incur on a connection. Fun fact, this doesn’t come from Nagle’s algorithm, but from Delayed ACKs or Corking. Yet people turn off Nagle’s algorithm … :sigh:
Here’s the thing though, would you rather your user wait 200ms, or 40s to download a few megabytes on an otherwise gigabit connection?
This isn’t the end of the journey. Follow this blog to get updates.
9 responses to “Golang is evil on shitty networks”
Google looks at all of TCP as “legacy cruft”, that’s why they are dumping everything and it’s uncle down binary UDP streams now. I really don’t understand how one can look at TCP’s optimizations for real networks over the years and then decide to throw it away wholesale. Quic, HTTP/3, SMB over Quic… the list goes on and on…
They probably never think about wifi from the basement. Every network framework I touch I set to no delay. 100ms is for ever and a good number of protocols have a series of small segments to auth then the data flows.
This wording is confusing, since “one packet” is surely fewer than “a few packets”. It should probably say “sent out as individual packets”.
Yep: if you use Golang’s io you almost MUST use bufio wrappers around. If git-lfs didn’t that that’s could be really considered as a bug.
Nagle’s algorithm by itself is a pure evil. I’ve never seen any production highload system that didn’t set TCP_NODELAY on its sockets. That is why Golang did it by default. It is good decision.
But that means one should always buffer its io. It is always good thing to buffer it in userspace. Nagle’s algorithm were born for those program’s that for some reason couldn’t do it. That it because telnet/ssh were single threaded, and they didn’t have a way to “wait for user input for some more and flush buffer after N ms either there was user input or not”. (Well, they could use SIGALARM, but for some reason they didn’t). Instead Nagle’s algorithm were introduced to work around their shortcomings, and by misfortune it were made as the default for all sockets.
I had an unpleasant experience with core Go developers “knowing best”. They listen patiently to your all pros and cons and then say “we know better” and leave as-is.
My issue was the total inability for Go to negotiate AES-256 in TLS 1.3 for Go client connecting to Go server. AES-128 has hardwired non-configurable higher precedence, and you can’t do anything about that. Their answer was “AES-128 is secure enough, stop wasting our time”. I ended up writing software in Rust…
Technicaly it’s not a languages fault, but it’s std lib.
Not have TCP_NODELAY set really increases the latency of network traffic. Back in the day we needed to make sure TCP_NODELAY was set to get maximum performance.
The idea isn’t to have it on for the entire request but turn it off when doing bulk transfers if and only if you aren’t filling packets (lots of small writes). If you’re filling packets, it doesn’t matter unless there is congestion after slow-start.