B-Tree Indexing Visualizer
Runs in browserVisualize database index operations
Configuration
Insert Key
Search Key
Keys (0)
Event Log
How to Use
Build and explore a B-Tree.
You will see:
- Node Splitting (when full)
- Tree Balancing operations
- Search path highlighting
Empty B-Tree
Insert keys to build the tree or click "Demo" for a quick visualization
The Engineering of B-Trees & Database Indexes
A B-Tree (Bayer-Tree) is a self-balancing tree data structure that
maintains sorted data and performs searches, sequential access, insertions, and deletions
in O(log n) time. Unlike Binary Search Trees (where nodes have at most two
children), B-Trees are designed to have massive fanout (thousands of children
per node), making them the mathematical foundation of nearly all relational database indexing
engines.
1. The Anatomy of a B-Tree
B-Trees are governed by an Order (t), which dictates the minimum and maximum capacity of every node in the tree.
Invariants of a B-Tree of Order t
- Key Bounds: Every node
(except the root) must contain between
t - 1and2t - 1keys. - Child Bounds: Every
internal node containing
kkeys must have exactlyk + 1children. - Perfect Balance: All leaf nodes MUST exist at the exact same depth. The tree grows from the root upwards, not the leaves downwards.
- Sorted Order: Keys within a node are stored in ascending order. A child node contains keys that strictly fall between the bounding keys of its parent pointer.
Node Splitting (Tree Balancing)
When inserting a key into a node that is already full (contains 2t - 1
keys), the B-Tree performs a split operation before traversing down to it. The
full node is split into two nodes, each taking t - 1 keys. The median key
is promoted upwards into the parent node. If the root node splits, the median key forms
a new root, which is how the B-Tree uniquely increases its height.
2. Why Databases use B-Trees (Not BSTs)
An academic Binary Search Tree works beautifully in RAM. But database indexes (often terabytes in size) must be persisted to spinning hard drives or SSDs. The constraint here is Disk I/O.
The Binary Tree Problem
A balanced BST of 1,000,000 records has a depth of ~20. To find one record, you must traverse 20 nodes. Since these nodes are scattered randomly across the disk, you incur 20 separate, expensive disk seeks (jumping the disk head to a new location).
The B-Tree Solution
Operating systems read disks in fixed Page Sizes (usually 4KB or 8KB). A B-Tree physically sizes a single Node to be exactly the size of one OS Page. A B-Tree with a fanout of 100 holding 1,000,000 records has a depth of only 3. You find the record in exactly 3 disk seeks.
3. B-Tree vs. B+ Tree
If you create an index in MySQL (InnoDB) or PostgreSQL, you aren't actually getting a standard B-Tree. You are getting a highly optimized variant known as the B+ Tree.
Standard B-Tree
- Stores the actual row data (or a pointer to it) inside every single node (root, internal, and leaf).
- Takes up more space per node, meaning less keys fit in a single 8KB disk page. Drops fanout and increases tree depth.
- Range queries (e.g.,
WHERE id > 100) are terribly inefficient, requiring complex traversing up and down the tree branches.
The B+ Tree (MySQL, Postgres)
- Stores row data strictly in the Leaf Nodes. Internal nodes only store routing keys.
- Because internal nodes don't hold bulky row data, they can pack exponentially more routing keys per 8KB page. Fanout explodes, keeping the tree incredibly shallow (depth 3 can hold billions of rows).
- The Magic: All leaf nodes are connected via a Linked List. For
range queries (
WHERE id > 100), it finds 100 inO(log n)time, then executes anO(1)sequential linked-list scan across the disk.
Further Reading
- Use The Index, Luke! - A definitive guide to database performance for developers.
- PostgreSQL: B-Tree Implementation - Deep dive into how Postgres engineers its physical B-Tree structure.