Frequency Thresholds and Sharded Eviction: Building CloxCache

tl;dr I tried to write a tiny cache for AtlasDB. Accidentally discovered that protecting items with freq > k from eviction gives near-optimal hit rates. Then discovered that the design scales almost linearly with CPU cores — unlike LRU, TinyLFU, or ClockPro. The whole algorithm is ~700 lines of Go.

Repo: github.com/bottledcode/cloxcache

A few months ago, I needed a cache for AtlasDB (a global distributed database) and I didn’t feel like evaluating libraries. So, I decided to build one myself. It’s just a cache, right? How hard could it be?

It started simple: “Hey ChatGPT, what are some really simple caching algorithms?” And thus began my journey of learning about CLOCK. It is, indeed, pretty straightforward. You have a circular buffer, a “clock hand” that goes around the circle, and each time the clock hand sweeps past a value in the cache, you “decay” it until it gets evicted. The funny part is that I implemented it wrong because ChatGPT sucks at remembering details and I didn’t look it up properly. The real CLOCK algorithm uses a single bit to “decay”, but I used 8-bits counting from 15 to 0. When a GET had a hit, I bumped it up a number, and when the hand swept by, it decremented the number. This turns out to be quite important later… Ironically, if I hadn’t implemented CLOCK incorrectly, I probably never would have discovered the thing that makes CloxCache work.

I realised pretty quickly, though, that it wouldn’t scale. This is going to be in a database, after all. So, I reworked it until I had a sharded CLOCK algorithm. Each shard has its own hand, its own counters, and state. Then the hash of the key determines which shard it goes to.

It was fast, it worked well enough to be honest. But earlier this week, I opened a PR to replace otter in frankenphp and some people ran some benchmarks. It, uhh, wasn’t better.

So, over the last few nights, I poured my brain into the problem. It’s fast—as in throughput—but the hit rate is absolutely terrible. See, CLOCK is pretty good, but as I started benchmarking against otter and just regular LFU and LRU … I realised there’s a lot of room for improvement.

It was around this time when I was dumping benchmarks into ChatGPT to summarise for me, that I learned about Bélády and his MIN algorithm. Especially how to use it to determine the optimum hit rate for a given cache. So, I implemented that and ran some benchmarks. My algorithm sucked.

That’s OK, I had Bélády’s algorithm to help me. So, I wrote a sharded version of the algorithm to prove that sharding the cache wouldn’t affect the hit rate. This, in fact, proved that a sharded cache is within 0.1% of a cache with global knowledge. This meant I could, theoretically … maybe, get my algorithm to be fully independent shards. At the time, there was still some shared state for stats. So, I made a way to disable stats or move them to each shard as needed.

From there, I decided to peel apart my algorithm and the Bélády algorithm to show me where I was choosing the wrong victim to evict.

I stared at the data, completely dumbfounded. I broke it down into buckets: 0–15 buckets and calculated the percentage of time I was right.

For 0-freq items, it was nearly a perfect 50/50 coin toss whether my algorithm would be right or wrong. My algorithm can’t see the future (unlike Bélády’s algorithm), and the only signal I had was frequency. What was even more interesting was that when evicting items with freq > 2, I was wrong nearly 80% of the time—these items were likely to be accessed again.

My algorithm can’t see the future, but as I learned long ago, if you can’t make a “right” decision, and you’re guaranteed to make a “wrong” decision, the best decision you can make is to not make any decision.

So, I rewrote the eviction algorithm as something basic:

if freq <= 2 {
  // Low frequency: candidate for eviction
} else {
  // Protected: only evict as last resort
}

And it started beating otter under some conditions, getting within 90–99% of the theoretical “perfect cache” that Bélády promises us must exist—as long as the working set fits in the cache.

I started trying to come up with a formal theory of why this even worked.

Under zipf(α), item popularity is proportional to 1/i^α. Thus, popular items accumulate frequency quickly. And if they do that, the probability of reaccess is monotonically increasing with frequency. And, at least for the data I was using at the time, the “cross-over” frequency where working-set membership becomes obvious is around freq = 2~4.

So, something like this?

E[P_r(x) | freq = f] is non-decreasing in f.

Empirically:

Tracefreq=1 P_rfreq≥7 P_r
Twitter18.8%43–68%
OLTP22.0%58–82%
Zipf-1.0140.5%76–87%

This surprisingly holds across almost all traces I was able to get my hands on. Hence, freq > 2 is probably in the working set and freq <= 2 is probably noise. In my work, I called this magic number: k.

That, right there, will allow you to beat or match LRU with almost no complexity. The entire library (not including tests and benchmarks) is about 700 lines of Go code.


I got my hands on more traces, and my theory started to break down. The ideal k varied by workload. So, why not make it adapt?

The key insight was tracking the graduation rate: what fraction of items survive long enough to cross the protection threshold before being evicted.

A low graduation rate means items are being evicted before they have a chance to prove their worth—the cache is too small for the current protection threshold. So we raise k to be more selective, only protecting the hottest items. A high graduation rate means the cache is comfortable and k can stay low.

rate := graduated / totalEvictions
if rate < 0.05 {
    k++  // Under pressure: be more selective
} else if rate > 0.20 {
    k--  // Comfortable: protect more items
}

This adaptation happens per-shard with no global coordination, maintaining lock-free read performance.

But wait—where did those magic numbers 0.05 and 0.20 come from? I picked them by staring at benchmarks. That’s not very satisfying, and different workloads might want different thresholds.

Self-Tuning Thresholds via Gradient Descent

The solution: let each shard learn its own thresholds by observing whether its k adjustments actually help.

The idea is simple gradient descent on hit rate:

  1. Track hit rate over a measurement window (2000 ops)
  2. After each k adjustment, compare current hit rate to previous
  3. If the adjustment improved hit rate, make that direction easier to trigger
  4. If the adjustment hurt hit rate, make that direction harder to trigger
// After measuring hit rate change from previous window
if lastKDirection > 0 {  // We increased k
    if hitRateImproved {
        rateHigh -= learningRate  // Easier to increase k again
    } else {
        rateHigh += learningRate  // Harder to increase k
    }
} else if lastKDirection < 0 {  // We decreased k
    if hitRateImproved {
        rateLow += learningRate   // Easier to decrease k again
    } else {
        rateLow -= learningRate   // Harder to decrease k
    }
}

Each shard maintains its own rateLow (default 0.25) and rateHigh (default 0.50) thresholds, bounded to sane ranges (0.05–0.40 and 0.30–0.80 respectively). The learning rate is 0.10 per adjustment—aggressive enough to adapt quickly, but bounded thresholds prevent runaway behavior.

The result: shards independently discover what graduation rate thresholds work best for their portion of the key space. Under zipf workloads, the thresholds tend to converge. Under mixed or shifting workloads, different shards might settle on different thresholds—and that’s fine, because each shard’s slice of the key space might genuinely have different access patterns.

It actually wasn’t hard (for once) and it worked well. Almost too well.

  • It matches or slightly beats LRU
  • It trails otter by ~1–3% on some workloads
  • Under high cache pressure, it raises k to be more selective (only protect very hot items)
  • Under low cache pressure, k settles at 2 (standard protection for items accessed 3+ times)
  • Gets within 90–99% of Bélády’s optimal when the working set fits in the cache.

Ghost Tracking: Memory for the Recently Evicted

There’s one more trick that significantly improves hit rate: ghost tracking.

When an item is evicted, instead of completely removing it, we keep a “ghost” entry—just the key and a negative frequency marker. No value is stored, so ghosts consume minimal memory. If that key is accessed again while it’s still a ghost, we know it was a mistake to evict it—this item is hotter than we thought.

// On eviction: item becomes a ghost
entry.freq = -1  // Negative = ghost
entry.value = nil  // No value stored

// On access: ghost gets promoted back
if entry.freq < 0 {
    entry.freq = k + 1  // Promote directly to protected status
    entry.value = newValue
}

The clever part is how ghosts use spare capacity. Each shard has a fixed number of hash table slots, but the active capacity (items with values) is typically smaller. Ghosts live in this gap:

┌─────────────────────────────────────────────────────────────┐
│                    Total Hash Slots                         │
├─────────────────────────────────────┬───────────────────────┤
│         Active Entries              │        Ghosts         │
│         (with values)               │    (keys only)        │
└─────────────────────────────────────┴───────────────────────┘

By default, CloxCache allocates 3x the capacity in slots. This means up to 2/3 of your slots can be ghosts, giving the cache a “memory” of recently evicted items. If you’re memory-constrained, you can reduce this ratio. If you want better hit rates at the cost of more memory overhead, increase it.

// Fine-tuning the ghost capacity
cache := cloxcache.New[string, Value](
    cloxcache.WithCapacity(10000),
    cloxcache.WithSlotsPerShard(1024),  // Adjust this ratio
)

When a ghost is promoted, it skips the probation period entirely—it goes straight to freq = k + 1, immediately protected from eviction. This is fair: the item already proved it was valuable by being accessed twice (once before eviction, once now).

Ghost tracking typically improves hit rate by 1–3% on real workloads, with the biggest gains on bursty access patterns where hot items might temporarily fall out of cache during cold periods.

But it gets even better! The real win is in the concurrency:

Here’s a benchmark on a read-heavy workload (90% reads, 10% writes), 1M ops:

GoroutinesCloxCacheSimpleLRUOtter
18Mops21Mops8Mops
427Mops19Mops14Mops
1648Mops23Mops15Mops
6449Mops19Mops12Mops
25645Mops19Mops9Mops
  • CloxCache has no global eviction structure
  • CloxCache has no global admission filter
  • CloxCache shards share no state

As far as I can tell from days of looking for it, CloxCache is the only eviction algorithm that is concurrency-first by design. Not concurrency-friendly. Not concurrency-tolerant.

Concurrency first.

At high concurrency, CloxCache delivers 2-3x the throughput of a sharded LRU and 4-5x the throughput of Otter.

So…

If you want:

  1. High hit rate across real-world workloads
  2. An eviction algorithm that’s shorter than some README files
  3. No global locks
  4. Multicore scaling that beats every algorithm I tested
  5. Predictable performance

Then CloxCache might actually be the simplest practical solution.

Is it the best cache ever invented? No.

Is it the simplest one that scales across cores? Probably, most likely, an emphatic yes.


Give it a go in your project: https://github.com/bottledcode/cloxcache

Stay in the loop

New essays, never spam.

Get fresh posts, experiments, and updates straight from the workbench. One click to unsubscribe whenever you like.