How Caching Works

The Engineering of Distributed Caching: Speed vs. Consistency

"There are only two hard things in Computer Science: cache invalidation and naming things." - Phil Karlton. Caching is fundamentally the mathematical trade-off of memory capacity for drastically reduced CPU cycles and network latency. When engineered correctly, it allows applications to scale instantaneously; when engineered poorly, it serves millions of users wildly incorrect data.

Part 1: The Physics of Memory Hierarchy

To understand caching, you must understand the physics of computer hardware. Reading 1MB sequentially from memory takes nanoseconds. Reading that same 1MB sequentially from an SSD takes microseconds. Reading it over a gigabit network takes milliseconds.

A traditional relational database (like PostgreSQL) stores its data securely on SSDs (or HDDs). Every time a user executes a complex SQL query containing multiple `JOIN`s and `GROUP BY` aggregations, the database must spin up CPU cycles to parse the query, fetch the blocks from the disk across the disk bus, assemble the result set in RAM, and stream it across the network.

If 100,000 users are hitting the homepage simultaneously, asking the database to recalculate the exact same "Top 10 Trending Articles" 100,000 times a second is an egregious waste of thermal energy and compute.

The Cache is an intermediary data store (commonly Redis or Memcached) that holds a subset of the database entirely in physical RAM. When the application needs data, it checks the RAM first.

Part 2: Primary Caching Strategies

How does the data actually get into the cache? There are several distinct architectural patterns for synchronizing the cache cluster with the primary database cluster.

1. Cache-Aside (Lazy Loading)

The most ubiquitous pattern on the internet. The cache sits aside the database. The Application is responsible for the orchestration.

Read: The App asks the Cache for `user:123`.
- If found (Hit), it returns it instantly.
- If not found (Miss), the App queries the Database, writes the result into the Cache, and then returns it to the user.
Write: The App updates the Database, and then immediately deletes the key from the Cache to force a recalculation on the next read.

Pro: The cache only stores highly requested data, minimizing RAM usage.
Con: A cache miss incurs a massive latency penalty (three network hops instead of one).

2. Write-Through Caching

The Application strictly treats the Cache as the main datastore. When writing data, the App writes to the Cache, and the Cache is synchronously responsible for writing to the underlying Database before returning a success message.

Pro: The Cache and Database are mathematically guaranteed to be in absolute sync.
Con: Every single write operation incurs higher latency, as it must traverse both systems sequentially.

3. Write-Behind (Write-Back) Caching

Similar to Write-Through, but the Cache acknowledges the write to the Application immediately after saving it to RAM, and then asynchronously flushes the data to the Database in batch operations seconds or minutes later.

Pro: Absurdly high write throughput (used heavily in ad-tech and high-frequency trading).
Con: If the Cache server loses power before the batch is flushed to disk, the data is permanently lost.

Part 3: The Menace of the Thundering Herd

Imagine it is Super Bowl Sunday. A highly complex database query powers the score scoreboard on a massive sports app. This query takes 5 seconds for the database to calculate, so the engineers cache the result in Redis with a Time-To-Live (TTL) of 10 seconds.

At exactly T=10s, the cache key violently expires.

At T=10.01s, 50,000 mobile phones hit the API simultaneously requesting the score. Because the key is gone, all 50,000 application threads experience a Cache Miss. All 50,000 threads simultaneously pivot and hurl the exact same complex 5-second SQL query at the backend Postgres database. Unprepared for the sudden 50,000x load spike, Postgres immediately runs out of connections, locks up, and crashes.

This catastrophic failure mode is known as the Thundering Herd (or Cache Stampede).

Solution 1: Distributed Locks

When a key misses, the App tries to acquire a lock (Mutex) in Redis. Only ONE thread successfully acquires the lock, queries the DB, and updates the cache. The other 49,999 threads sleep for 50ms and check the cache again repeatedly until the first thread finishes.

Solution 2: Probabilistic Early Expiration (PER)

No locking required. The application checks the cache. As the key gets closer to its TTL expiration, a random number generator determines if the App should pretend the key expired early, fetch the data in the background, and quietly update the cache before it actually dies for everyone else.

Part 4: Eviction Policies (When RAM Fills Up)

RAM is expensive. A large database might hold 10 Terabytes of data, but the Redis cluster sitting in front of it might only have 100 Gigabytes of RAM. Eventually, the cache reaches 100% capacity. When the next `SET` command arrives, what data gets deleted to make room?

This is governed by the Eviction Policy. The algorithm you choose dictates your Cache Hit Ratio.

NoEviction: Redis simply returns an Out-Of-Memory (OOM) error on writes. Used when Redis is acting as a primary database, not just a cache.
AllKeys-LRU (Least Recently Used): The industry standard. Redis tracks the Unix timestamp of every GET or SET. It deletes the keys that haven't been touched in the longest amount of time.
AllKeys-LFU (Least Frequently Used): LRU has a flaw: a batch analytics job might scan a billion obscure keys once a night, destroying the entire LRU history of your hot data. LFU counters this by tracking an algorithmic "frequency counter." Keys constantly accessed (even if not strictly the most recently) are protected.
AllKeys-Random: Selects a random key and deletes it. Incredibly fast (no sorting CPU overhead), but terrible for hit-ratios.

Part 5: Cache Penetration & The Bloom Filter

Cache Penetration occurs when an attacker continually requests data that simply does not exist in the database.

For example, an attacker runs a script requesting `/user/-1`, `/user/xxxyyy`, etc. The request hits the server, misses the cache, hits the database, the database returns NULL, and the application returns a 404. Because the data is NULL, the application never caches anything. The attacker can easily DDoS the fragile backend database by continually requesting fake IDs, completely bypassing the cache layer.

The solution rests in a probabilistic data structure called a Bloom Filter.

A Bloom Filter is array of bits initialized to zero. When a valid user ID is created, it is run through 5 different hash algorithms, flipping 5 specific bits to `1`.

When a request arrives, the ID is hashed. If any of the 5 bits are 0, the Bloom Filter mathematically guarantees the item definitely does not exist. The application immediately returns a 404 without ever touching the Database or the Cache. If all 5 bits are `1`, the item probably exists, and the application proceeds with the database query. Bloom filters can represent billions of elements in just a few megabytes of memory.

Conclusion: The Buffer Zone

Distributed caching is the ultimate shock absorber of the internet. It protects complex relational and non-relational datastores from the savage, unpredictable burst patterns of viral web traffic. However, engineers must meticulously design their eviction policies, lock mechanisms, and invalidation triggers, lest they build a lightning-fast system that serves nothing but highly efficient, rapidly decaying lies.

How Caching Works

The Request

What Happens

Why

Key Takeaways

Memory is Fast

Staleness

Memory is Limited

Related Resources

LRU Cache Simulator

How Redis Works