LSM-Trees vs B-Trees: The Definitive Storage Engine Guide

At the heart of every database lies a storage engine—the component responsible for storing, retrieving, and managing data on disk. While there are many variations, two fundamental data structures dominate the landscape of modern databases: Log-Structured Merge-Trees (LSM-Trees) and B-Trees. This interactive simulator allows you to visualize the mechanics of each, helping you understand why you might choose RocksDB over PostgreSQL, or vice versa, for specific workloads.

B-Trees: The Ubiquitous Standard

The B-Tree (and its variant, the B+ Tree) has been the gold standard for general-purpose storage engines since the 1970s. It is the default engine for ubiquitous relational databases like PostgreSQL, MySQL (InnoDB), SQL Server, and SQLite.

How It Works

The fundamental unit of a B-Tree is the Page (or block), typically 4KB or 8KB in size. This corresponds closely to the underlying hardware's block size.

In-Place Updates: Unlike LSM-trees, B-Trees are mutable. When you update a record, the engine finds the specific page containing that record and overwrites the data in place. If the new data fits, no other pages are affected.
Tree Structure: B-Trees are balanced search trees. They maintain a distinct hierarchy where all leaf nodes are at the same depth. This strict balancing guarantees that looking up any key takes O(log n) time, providing extremely predictable read performance.
Page Splitting: As seen in the simulator, B-Trees have a fixed capacity per node. When you insert data into a full page, it must "split" into two half-full pages. The middle key is promoted to the parent node. This process can ripple up to the root, increasing the tree's height.

Strengths & Weaknesses

Why use B-Trees?

✅ Fast Reads: Data is structured for quick lookup. Range scans are efficient (in B+ Trees).
✅ Predictable Latency: No background compaction processes causing CPU spikes.
✅ Transaction Support: Easier to implement ACID isolation levels due to single-copy semantics.

Challenges

❌ Random Writes: To update a tiny row, the entire 4KB page must be written. This leads to "Write Amplification".
❌ Fragmentation: Page splitting often results in pages that are only partially full, wasting disk space.
❌ Concurrency: Locking a page for update can block other writers, though modern techniques (latching) mitigate this.

LSM-Trees: The Write-Throughput Kings

Log-Structured Merge-Trees took center stage with the rise of "Big Data" and high-ingestion workloads. They are the backbone of modern NoSQL and NewSQL systems like Apache Cassandra, ScyllaDB, RocksDB (used by CockroachDB, YugabyteDB), and Google BigTable.

How It Works

LSM-Trees fundamentally differ from B-Trees by treating the disk as an append-only log. They never modify existing files; they only write new ones.

The Memtable (RAM): When a write comes in, it initially goes into an in-memory buffer called the Memtable. This is usually a sorted structure (like a Red-Black tree or Skip List). Because it's in RAM, inserts are incredibly fast.
The Flush (Sequential I/O): When the Memtable fills up (e.g., reaches 64MB), it is flushed to disk as a Sorted String Table (SSTable). Crucially, this write is sequential. Mechanical HDDs and even modern SSDs prefer sequential writes over random writes, giving LSM-trees massive write throughput.
SSTables (Immutable Disk): Once written, an SSTable is never changed. If you update a key, you simply write the new value to a new Memtable. If you delete a key, you write a special "Tombstone" marker.
Compaction: Over time, you accumulate many SSTables. A background process runs Compaction, which merges these files, discarding overwritten values and tombstoned records to reclaim space.

Strengths & Weaknesses

Why use LSM-Trees?

✅ Extreme Write Speed: Buffer writes in RAM + Sequential I/O = millions of writes per second.
✅ Compression: SSTables are immutable, allowing for aggressive block compression, saving 50%+ disk space compared to B-Trees.
✅ Hardware Friendly: Reduces write amplification on flash storage (SSDs) and minimizes seek times on HDDs.

Challenges

❌ Read Amplification: To read a key, the system might check the Memtable, then L0 SSTable, then L1... potentially keeping multiple files open. Bloom Filters are used to mitigate this.
❌ Compaction Stalls: Heavy background compaction can consume disk bandwidth, slowing down user queries temporarily.
❌ Space Amplification: Old data lingers on disk until compaction runs.

Head-to-Head Comparison

Feature	B-Tree (InnoDB, WiredTiger B-Tree)	LSM-Tree (RocksDB, LevelDB)
Write Pattern	Random I/O (Update pages in place)	Sequential I/O (Append Only)
Write Performance	Good (limited by seek/IOPS)	Excellent (limited by bandwidth)
Read Performance	Excellent & Stable	Variable (Bloom filters help significantly)
Space Efficiency	Lower (Fragmentation)	Higher (Better compression)
Best For...	General Purpose, Read-Heavy, Transactional (OLTP)	Write-Heavy, Time-Series, Logs, High-Throughput

Concept Deep Dive: Write Amplification

One of the most critical metrics in storage engine design is Write Amplification (WA). It is defined as the ratio of data written to disk versus data written by the user.

WA = (Bytes written to storage medium) / (Bytes written by user)

B-Tree WA

If you update a 100-byte row in a B-Tree, the engine must write out the full 4KB (4096 bytes) page (plus WAL entry).
WA ≈ 4096 / 100 = 40x.
This wears out SSDs faster and consumes IOPS.

LSM-Tree WA

LSM writes the data compactly to the WAL and SSTable. However, Compaction reads and re-writes the same data multiple times as it moves down levels.
Even so, for random write workloads, LSM trees typically have lower WA than B-Trees, often in the range of 5x - 15x depending on the workload.

References & Further Reading

RocksDB Overview (LSM Implementation) - Official documentation from Meta explaining the architecture of one of the most popular LSM stores.
Bigtable: A Distributed Storage System for Structured Data - The original paper by Google (Chang et al., 2006) that popularized the LSM-tree concept.
Wikipedia: B-Tree - Mathematical properties and history of the B-Tree data structure.
Log Structured Merge Trees - Ben Stopford - An excellent deep-dive article from a Kafka engineer.
The LSM-Tree Paper (O'Neil et al., 1996) - The original academic paper introducing the concept.

LSM-Tree vs B-Tree Simulator

LSM-Tree

Memtable (Sorted)

SSTables (Immutable)

B-Tree

Related Tools

DB Isolation Levels

B-Tree Visualizer

LRU Cache

Database Indexing

How to Use This Simulator

LSM-Trees vs B-Trees: The Definitive Storage Engine Guide

B-Trees: The Ubiquitous Standard

How It Works

Strengths & Weaknesses

Why use B-Trees?

Challenges

LSM-Trees: The Write-Throughput Kings

How It Works

Strengths & Weaknesses

Why use LSM-Trees?

Challenges

Head-to-Head Comparison

Concept Deep Dive: Write Amplification

B-Tree WA

LSM-Tree WA

References & Further Reading