The Engineering of B-Trees & Database Indexes

A B-Tree (Bayer-Tree) is a self-balancing tree data structure that maintains sorted data and performs searches, sequential access, insertions, and deletions in O(log n) time. Unlike Binary Search Trees (where nodes have at most two children), B-Trees are designed to have massive fanout (thousands of children per node), making them the mathematical foundation of nearly all relational database indexing engines.

1. The Anatomy of a B-Tree

B-Trees are governed by an Order (t), which dictates the minimum and maximum capacity of every node in the tree.

Invariants of a B-Tree of Order t

Key Bounds: Every node (except the root) must contain between t - 1 and 2t - 1 keys.
Child Bounds: Every internal node containing k keys must have exactly k + 1 children.
Perfect Balance: All leaf nodes MUST exist at the exact same depth. The tree grows from the root upwards, not the leaves downwards.
Sorted Order: Keys within a node are stored in ascending order. A child node contains keys that strictly fall between the bounding keys of its parent pointer.

Node Splitting (Tree Balancing)

When inserting a key into a node that is already full (contains 2t - 1 keys), the B-Tree performs a split operation before traversing down to it. The full node is split into two nodes, each taking t - 1 keys. The median key is promoted upwards into the parent node. If the root node splits, the median key forms a new root, which is how the B-Tree uniquely increases its height.

2. Why Databases use B-Trees (Not BSTs)

An academic Binary Search Tree works beautifully in RAM. But database indexes (often terabytes in size) must be persisted to spinning hard drives or SSDs. The constraint here is Disk I/O.

The Binary Tree Problem

A balanced BST of 1,000,000 records has a depth of ~20. To find one record, you must traverse 20 nodes. Since these nodes are scattered randomly across the disk, you incur 20 separate, expensive disk seeks (jumping the disk head to a new location).

The B-Tree Solution

Operating systems read disks in fixed Page Sizes (usually 4KB or 8KB). A B-Tree physically sizes a single Node to be exactly the size of one OS Page. A B-Tree with a fanout of 100 holding 1,000,000 records has a depth of only 3. You find the record in exactly 3 disk seeks.

3. B-Tree vs. B+ Tree

If you create an index in MySQL (InnoDB) or PostgreSQL, you aren't actually getting a standard B-Tree. You are getting a highly optimized variant known as the B+ Tree.

Standard B-Tree

Stores the actual row data (or a pointer to it) inside every single node (root, internal, and leaf).
Takes up more space per node, meaning less keys fit in a single 8KB disk page. Drops fanout and increases tree depth.
Range queries (e.g., WHERE id > 100) are terribly inefficient, requiring complex traversing up and down the tree branches.

The B+ Tree (MySQL, Postgres)

Stores row data strictly in the Leaf Nodes. Internal nodes only store routing keys.
Because internal nodes don't hold bulky row data, they can pack exponentially more routing keys per 8KB page. Fanout explodes, keeping the tree incredibly shallow (depth 3 can hold billions of rows).
The Magic: All leaf nodes are connected via a Linked List. For range queries (WHERE id > 100), it finds 100 in O(log n) time, then executes an O(1) sequential linked-list scan across the disk.

B-Tree Indexing Visualizer

Configuration

Insert Key

Search Key

Keys (0)

Event Log

How to Use

Related Tools

LSM-Tree vs B-Tree Simulator

LRU Cache

Database Indexing

SQL Formatter