CAP Theorem Simulator
Runs in browserVisualize distributed systems trade-offs
Consistency + Partition Tolerance
System remains consistent during partitions but may reject writes/reads to maintain consistency.
Writes
0
Reads
0
Rejected
0
Stale Reads
0
CAP Theorem
Consistency - Every read receives the most recent write
Availability - Every request receives a response
Partition Tolerance - System operates despite network failures
How to Use
Partition the network and attempt writes/reads.
You will see:
- CP: Writes fail during partition to maintain Consistency
- AP: Writes succeed but nodes drift (Inconsistency)
- Heal: Conflict resolution or synchronization
Related Tools
The CAP Theorem: The Fundamental Law of Distributed Systems
In the late 1990s, as the internet exploded and single-node relational databases began to buckle under planetary-scale traffic, computer scientists realized they had to distribute data across multiple physical machines. But doing so introduced a harsh, inescapable mathematical reality. Formulated by Eric Brewer in 2000 and mathematically proven by Seth Gilbert and Nancy Lynch in 2002, the CAP Theorem dictates that any distributed data store can only provide two of the following three guarantees simultaneously: Consistency, Availability, and Partition Tolerance.
1. The Three Pillars of CAP
To understand the theorem, we must rigorously define what these three letters actually mean in the context of distributed engineering. They do not mean what they mean in casual conversation.
Consistency
Definition: Every read receives the most recent write or an error.
This is Linearizability. If a system is Consistent in the CAP sense, it behaves as if there is only one copy of the data, and all operations execute instantaneously. If Client A writes "X=5", Client B must read "X=5" immediately after. No client will ever see stale data.
Availability
Definition: Every request receives a non-error response, without the guarantee that it contains the most recent write.
If a system is Available, any non-failing node must return a valid response in a reasonable amount of time. It cannot simply hang forever, and it cannot return an error saying "I can't talk to the leader right now." It must respond, even if the data is out of date.
Partition Tolerance
Definition: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.
Networks are unreliable. Cables get cut, switches reboot, and garbage-collection pauses cause infinite delays. A partition-tolerant system survives the fact that Node A cannot talk to Node B.
2. The Illusion of "Choosing Two"
The classic explanation of CAP is a Venn Diagram where you pick any two: CA, CP, or AP. This is highly misleading. In the real world of distributed computing operating across a physical network, partitions are guaranteed to happen. Networks fail. Always.
Because you cannot avoid P (Partition Tolerance), you do not get to pick "CA". A "CA" system is simply a single-node database (like a standalone Postgres instance on your laptop). The moment you put data on two machines separated by a network cable, you are forcing P into the equation.
Therefore, the CAP Theorem isn't a menu of three options. It is a decision you must make during a network failure: Do you drop Availability (cancel the request to ensure Consistency) or drop Consistency (return stale data to ensure Availability)?
CP Systems: Sacrificing Availability
If a network partition occurs, a CP system will protect data integrity at all costs. If Node A receives a Write, but cannot replicate it to Node B because the network is down, Node A will return an Error to the client (or timeout).
- Behavior: Refuses to serve stale data.
- Use-case: Financial ledgers, billing systems, inventory control. (You cannot allow two users to withdraw the same $100 simultaneously).
- Examples: MongoDB (strict mode), Etcd, ZooKeeper, Google Spanner, HBase.
AP Systems: Sacrificing Consistency
If a network partition occurs, an AP system will continue to serve requests, even if nodes cannot agree. Node A will happily accept a Write, even if it can't tell Node B. Meanwhile, Node B will happily serve Reads, even though its data is out-of-date.
- Behavior: Always responds, but data may conflict or be stale.
- Use-case: Social media feeds, shopping cart additions, analytics gathering. (It's better to show an old tweet than an error page).
- Examples: Cassandra, DynamoDB (eventual consistency mode), CouchDB, Riak.
3. Beyond CAP: The PACELC Theorem
A major criticism of the CAP Theorem is that it only describes what happens during a disaster (a network partition). But what happens the other 99.9% of the time when the network is perfectly fine?
In 2010, Daniel Abadi proposed the PACELC Theorem, which accurately describes the continuous trade-offs databases make. It states:
The Normal Operation Trade-off (E: L vs C)
Even if there is no partition, data takes physical time (Latency) to travel at the speed of light across a datacenter or ocean. When a client writes to Node A, Node A must send that data to Node B.
- CFavoring Consistency (EC) Node A blocks the client's request, waits for Node B to acknowledge receipt, and then responds "Success". This guarantees Consistency, but increases Latency. (e.g., synchronous replication).
- LFavoring Latency (EL) Node A responds "Success" to the client instantly, and then asynchronously replicates to Node B in the background. This provides ultra-fast Latency, but creates a brief window of Inconsistency where Node B is out of date. (e.g., asynchronous replication).
Database Classifications under PACELC:
| Classification | Meaning | Examples |
|---|---|---|
| PC/EC | Consistent during partitions AND Consistent (slow Latency) normally. | VoltDB, Megastore |
| PA/EL | Available during partitions AND Fast Latency (inconsistent) normally. | DynamoDB, Cassandra, Riak |
| PA/EC | Available during partitions BUT Consistent (slow Latency) normally. | MongoDB (certain configs) |
| PC/EL | Consistent during partitions BUT Fast Latency (inconsistent) normally. | Kafka, Yahoo PNUTS |
4. Real World Chaos: Split-Brain and Consistency Models
System design is rarely binary. Engineering modern databases involves fighting physics using complex algorithms to navigate the spectrum between absolute CP and absolute AP.
The "Split-Brain" Problem
Imagine a 2-node CP system (Node A and Node B). The network cable between them is cleanly cut. Both nodes are still running and accepting client traffic, but they cannot talk to each other. Node A assumes Node B is dead. Node B assumes Node A is dead. If a client writes "X=5" to A, and another writes "X=10" to B, the system has fractured into two diverging realities. This is a Split-Brain.
To prevent this, CP systems use Quorums (see the Quorum Simulator) and consensus algorithms like Raft or Paxos. To safely accept a write, a node must communicate with a strict majority (N/2 + 1) of the cluster. In a 5-node cluster, a partition might split it into a group of 3 and a group of 2. The group of 3 has quorum and continues operating. The group of 2 realizes it lacks a majority and intentionally halts itself (sacrificing Availability) to prevent Split-Brain.
Levels of "Consistency"
If you choose AP, what exactly does your "inconsistency" look like? Database engineers have defined a gradient of consistency models:
- Strong Consistency (Linearizability): The gold standard. Read always returns the latest write. Requires synchronous coordination (CP).
- Sequential Consistency: Operations appear to take place in some total order, though not necessarily exactly correlated with physical time clocks.
- Causal Consistency: If Write B was triggered by Reading Write A, then all clients will see A before B. Causally unrelated writes can be seen in any order.
- Read-Your-Own-Writes Consistency: A client will always see its own updates immediately, even if other clients don't see them yet. Useful for UX profiles.
- Eventual Consistency: The weakest AP model. If all writes stop, eventually (in milliseconds, hours, or days), all nodes will converge to the same state. Before that, reads can return wildly different answers depending on which node you hit.
Conflict Resolution in AP Systems
If an AP system allows two disconnected nodes to accept competing writes during a partition, how does it fix the mess when the network heals?
Comprehensive References & Essential Reading
- CAP Twelve Years Later: How the "Rules" Have Changed - The creator of the theorem, Eric Brewer, explaining how modern systems navigate the edges of CAP.
- Please stop calling databases CP or AP - Martin Kleppmann (author of Designing Data-Intensive Applications) arguing that CP/AP labels are too simplistic for modern DB architectures.
- Jepsen Consistency Analyses - Kyle Kingsbury's terrifyingly brilliant series where he intentionally subjects enterprise databases to horrific network partitions to prove their CAP claims are often lies.
- Consistency Tradeoffs in Modern Distributed Database System Design (PACELC) - Daniel Abadi's original paper expanding CAP.