The Definitive Guide to Consistent Hashing

Consistent Hashing is an elegant algorithmic solution to the core problem of scaling distributed systems: How do you distribute data across `N` servers such that when a server crashes or a new one is added, you don't have to reshuffle all your data? It is the foundational routing layer for massive distributed databases and caching clusters.

The Problem with Traditional Modulo Hashing

In a classic distributed cache (like Memcached), if you have 4 servers, you determine where to place a piece of data (e.g., `user_123`) by hashing the ID and taking the modulo of the server count: server_index = hash(key) % 4.

The "Rehashing" Catastrophe

If traffic spikes and you add a 5th server, the formula changes to hash(key) % 5. Suddenly, almost every key maps to a completely different server index than it did before. Nearly 99% of your cache becomes instantly invalid. The database is immediately slammed with millions of cache-misses, often causing total system collapse (Thundering Herd problem).

The Consistent Hashing Solution

Consistent hashing solves this by decoupling the hash mapping from the number of servers. Instead of using modulo N, both the Servers and the Keys are hashed into the exact same massive, abstract integer space (usually 0 to 2³² - 1), which is visualized as a continuous circular ring.

1. Placement

Hash your servers (e.g., using their IP address) and place them on the edge of the ring. Then, hash your incoming keys and place them on the ring.

2. Searching

To find which server owns a key, start at the key's position on the ring and walk clockwise. The first server you encounter represents the owner of that key.

Graceful Scaling (Adding/Removing Nodes)

Under consistent hashing, when a new Server 5 is added to the ring, it drops in between two existing servers.

It only takes over the keys that sit between itself and the previous server (counter-clockwise).
All other keys on the entire ring remain perfectly untouched.
Result: If you scale out by adding 1 node to a 4-node cluster (a 25% capacity increase), only ~20% of your keys need to move. 80% of your cluster’s cache stays perfectly valid.

The Virtual Nodes (vNodes) Enhancement

In a naive implementation with only 4 servers randomly hashed onto a giant ring, the distribution of space between them is highly likely to be unequal. One server might accidentally take over 60% of the ring's surface area, resulting in highly skewed traffic loads.

The Fix:

Instead of mapping a physical server to a single point on the ring, we hash each server 100 or 250 times using different salts (e.g., hash(ServerA_01), hash(ServerA_02)). This creates a constellation of hundreds of "virtual nodes" scattered randomly around the entire ring, acting as pointers back to the single physical server.

Ensures statistically perfect, even distribution of incoming keys.
Makes it trivial to handle servers with different hardware (powerful servers get 300 vNodes, weak servers get 100 vNodes).

Production Examples

DynamoDB & Cassandra Uses consistent hashing for their distributed Partition Key routing, mapping database rows to specific storage nodes in the ring.

Discord / Chat Apps Routing WebSocket connections. If Server A goes down, users seamlessly reconnect to Server B because their Channel ID hashes to the next node in the ring.

Consistent Hashing Simulator

Configuration

How to Use

Nodes (3)

Key Distribution

Related Tools

Load Balancer

LRU Cache