CAP Theorem Simulator

Runs in browser

Visualize distributed systems trade-offs

System Type

Consistency + Partition Tolerance

System remains consistent during partitions but may reject writes/reads to maintain consistency.

MongoDB HBase Redis Cluster Zookeeper

Writes

0

Reads

0

Rejected

0

Stale Reads

0

CAP Theorem

Consistency - Every read receives the most recent write

Availability - Every request receives a response

Partition Tolerance - System operates despite network failures

How to Use

Partition the network and attempt writes/reads.

You will see:

  • CP: Writes fail during partition to maintain Consistency
  • AP: Writes succeed but nodes drift (Inconsistency)
  • Heal: Conflict resolution or synchronization
Cluster Visualization
Online
Partitioned
Stale Data
Event Log
No events yet. Start the simulation or perform actions.

The CAP Theorem: The Fundamental Law of Distributed Systems

In the late 1990s, as the internet exploded and single-node relational databases began to buckle under planetary-scale traffic, computer scientists realized they had to distribute data across multiple physical machines. But doing so introduced a harsh, inescapable mathematical reality. Formulated by Eric Brewer in 2000 and mathematically proven by Seth Gilbert and Nancy Lynch in 2002, the CAP Theorem dictates that any distributed data store can only provide two of the following three guarantees simultaneously: Consistency, Availability, and Partition Tolerance.


1. The Three Pillars of CAP

To understand the theorem, we must rigorously define what these three letters actually mean in the context of distributed engineering. They do not mean what they mean in casual conversation.

C

Consistency

Definition: Every read receives the most recent write or an error.

This is Linearizability. If a system is Consistent in the CAP sense, it behaves as if there is only one copy of the data, and all operations execute instantaneously. If Client A writes "X=5", Client B must read "X=5" immediately after. No client will ever see stale data.

A

Availability

Definition: Every request receives a non-error response, without the guarantee that it contains the most recent write.

If a system is Available, any non-failing node must return a valid response in a reasonable amount of time. It cannot simply hang forever, and it cannot return an error saying "I can't talk to the leader right now." It must respond, even if the data is out of date.

P

Partition Tolerance

Definition: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.

Networks are unreliable. Cables get cut, switches reboot, and garbage-collection pauses cause infinite delays. A partition-tolerant system survives the fact that Node A cannot talk to Node B.


2. The Illusion of "Choosing Two"

The classic explanation of CAP is a Venn Diagram where you pick any two: CA, CP, or AP. This is highly misleading. In the real world of distributed computing operating across a physical network, partitions are guaranteed to happen. Networks fail. Always.

Because you cannot avoid P (Partition Tolerance), you do not get to pick "CA". A "CA" system is simply a single-node database (like a standalone Postgres instance on your laptop). The moment you put data on two machines separated by a network cable, you are forcing P into the equation.

Therefore, the CAP Theorem isn't a menu of three options. It is a decision you must make during a network failure: Do you drop Availability (cancel the request to ensure Consistency) or drop Consistency (return stale data to ensure Availability)?

CP Systems: Sacrificing Availability

If a network partition occurs, a CP system will protect data integrity at all costs. If Node A receives a Write, but cannot replicate it to Node B because the network is down, Node A will return an Error to the client (or timeout).

  • Behavior: Refuses to serve stale data.
  • Use-case: Financial ledgers, billing systems, inventory control. (You cannot allow two users to withdraw the same $100 simultaneously).
  • Examples: MongoDB (strict mode), Etcd, ZooKeeper, Google Spanner, HBase.

AP Systems: Sacrificing Consistency

If a network partition occurs, an AP system will continue to serve requests, even if nodes cannot agree. Node A will happily accept a Write, even if it can't tell Node B. Meanwhile, Node B will happily serve Reads, even though its data is out-of-date.

  • Behavior: Always responds, but data may conflict or be stale.
  • Use-case: Social media feeds, shopping cart additions, analytics gathering. (It's better to show an old tweet than an error page).
  • Examples: Cassandra, DynamoDB (eventual consistency mode), CouchDB, Riak.

3. Beyond CAP: The PACELC Theorem

A major criticism of the CAP Theorem is that it only describes what happens during a disaster (a network partition). But what happens the other 99.9% of the time when the network is perfectly fine?

In 2010, Daniel Abadi proposed the PACELC Theorem, which accurately describes the continuous trade-offs databases make. It states:

If Partition (P), how does the system trade off Availability (A) and Consistency (C)?
Else (E)
When the system is running normally, how does it trade offLatency (L) and Consistency (C)?

The Normal Operation Trade-off (E: L vs C)

Even if there is no partition, data takes physical time (Latency) to travel at the speed of light across a datacenter or ocean. When a client writes to Node A, Node A must send that data to Node B.

  • C
    Favoring Consistency (EC) Node A blocks the client's request, waits for Node B to acknowledge receipt, and then responds "Success". This guarantees Consistency, but increases Latency. (e.g., synchronous replication).
  • L
    Favoring Latency (EL) Node A responds "Success" to the client instantly, and then asynchronously replicates to Node B in the background. This provides ultra-fast Latency, but creates a brief window of Inconsistency where Node B is out of date. (e.g., asynchronous replication).

Database Classifications under PACELC:

Classification Meaning Examples
PC/EC Consistent during partitions AND Consistent (slow Latency) normally. VoltDB, Megastore
PA/EL Available during partitions AND Fast Latency (inconsistent) normally. DynamoDB, Cassandra, Riak
PA/EC Available during partitions BUT Consistent (slow Latency) normally. MongoDB (certain configs)
PC/EL Consistent during partitions BUT Fast Latency (inconsistent) normally. Kafka, Yahoo PNUTS

4. Real World Chaos: Split-Brain and Consistency Models

System design is rarely binary. Engineering modern databases involves fighting physics using complex algorithms to navigate the spectrum between absolute CP and absolute AP.

The "Split-Brain" Problem

Imagine a 2-node CP system (Node A and Node B). The network cable between them is cleanly cut. Both nodes are still running and accepting client traffic, but they cannot talk to each other. Node A assumes Node B is dead. Node B assumes Node A is dead. If a client writes "X=5" to A, and another writes "X=10" to B, the system has fractured into two diverging realities. This is a Split-Brain.

To prevent this, CP systems use Quorums (see the Quorum Simulator) and consensus algorithms like Raft or Paxos. To safely accept a write, a node must communicate with a strict majority (N/2 + 1) of the cluster. In a 5-node cluster, a partition might split it into a group of 3 and a group of 2. The group of 3 has quorum and continues operating. The group of 2 realizes it lacks a majority and intentionally halts itself (sacrificing Availability) to prevent Split-Brain.

Levels of "Consistency"

If you choose AP, what exactly does your "inconsistency" look like? Database engineers have defined a gradient of consistency models:

  • Strong Consistency (Linearizability): The gold standard. Read always returns the latest write. Requires synchronous coordination (CP).
  • Sequential Consistency: Operations appear to take place in some total order, though not necessarily exactly correlated with physical time clocks.
  • Causal Consistency: If Write B was triggered by Reading Write A, then all clients will see A before B. Causally unrelated writes can be seen in any order.
  • Read-Your-Own-Writes Consistency: A client will always see its own updates immediately, even if other clients don't see them yet. Useful for UX profiles.
  • Eventual Consistency: The weakest AP model. If all writes stop, eventually (in milliseconds, hours, or days), all nodes will converge to the same state. Before that, reads can return wildly different answers depending on which node you hit.

Conflict Resolution in AP Systems

If an AP system allows two disconnected nodes to accept competing writes during a partition, how does it fix the mess when the network heals?

Last-Writer-Wins (LWW) Uses timestamps. Whoever wrote last overwrites the other. Flawed because distributed clocks are never perfectly synchronized (NTP drift). A "newer" write might actually have happened earlier.
Vector Clocks DynamoDB tracks algorithmic causality (e.g., `[NodeA:2, NodeB:1]`). If the system detects a divergence it cannot mathematically resolve, it passes both conflicting versions up to the application code and forces the developer to write custom logic to merge them.
CRDTs (Conflict-Free Replicated Data Types) The modern bleeding edge. Mathematical data structures (like maps, counters, or sets) that can be mutated concurrently without coordination and guarantee perfectly deterministic conflict resolution when merged. Powers collaborative tools like Figma or Google Docs.

Comprehensive References & Essential Reading