How Write-Ahead Logging Works

The secret to high-performance durability. See how databases survive crashes without sacrificing speed using append-only logs.

Storage Engine Simulator Sharding Visualizer

Key Takeaways

Sequential I/O Wins

Appending to a log is orders of magnitude faster than random disk writes to scattered database pages.

Durability First

A transaction is only "committed" once its record is safely flushed to the WAL disk via fsync().

Crash Recovery

On reboot, the database replays the WAL to reconstruct any state missing from the main data files.

Write-Ahead Logging (WAL): The Engineering of Durability

Write-Ahead Logging is the fundamental mechanism that allows modern databases to maintain absolute data integrity without making every transaction agonizingly slow. It is the compromise that solved the "Durability Paradox."

Part 1: The Durability Paradox

In database engineering, we face a constant conflict between speed and safety. To ensure Durability (the 'D' in ACID), data must be written to non-volatile storage (disk) before a transaction is confirmed to the user.

Random I/O (The Slow Path)

Updating the actual data pages (B-Trees) requires jumping around the disk to find specific blocks. Even on fast NVMe SSDs, this is expensive and inefficient for high-concurrency workloads.

Sequential I/O (The WAL Path)

Instead of updating the data files, we simply append a record of the change to the end of a log file. Appending is blazingly fast because the disk head (or controller) never needs to search.

Part 2: The Golden Rule - "Write Ahead"

The name "Write-Ahead" is literal. The rule states: No data page should be modified on disk until the log record describing that change has been safely flushed.

Log Sequence Numbers (LSN)

In PostgreSQL and other engines, every log entry has a unique, monotonically increasing 64-bit identifier called an LSN. Data pages in memory also store the LSN of the last change made to them. The database refuses to write a page to disk if its LSN is greater than the last LSN flushed to the WAL.

Part 3: The fsync() Bottleneck

Even with sequential logging, there is one unavoidable slowing factor: fsync(). When a transaction commits, the database must wait for the Operating System to confirm the data is physically on the storage medium, not just in the kernel's write cache.

To combat this, modern databases use Group Commit. Instead of one fsync per user, the database waits a vài milliseconds to bundle dozens of commits together and flushes them in a single high-speed disk operation.

Part 4: Checkpointing - Turning Logs into Data

If we never updated the actual data files, the WAL would eventually consume the entire disk. Furthermore, recovery after a crash would take hours as the database replayed years of logs.

Checkpointing is the process of periodically syncing the "dirty" pages in memory (which reflect the latest changes) down to the main database files on disk. Once a checkpoint completes, all previous WAL logs are no longer strictly needed for crash recovery and can be recycled or archived for backups.

Why not checkpoint constantly?

Frequent checkpoints keep recovery times low but introduce heavy "background noise" writes to the disk, which can pipe-line and slow down active user transactions. Finding the right "checkpoint interval" is a key task for any Database Administrator.

Part 5: Crash Recovery - The "Redo" Cycle

When a database restarts after an unexpected shutdown, it is in an "inconsistent" state. The main data files are old, and the RAM state is gone. It enters recovery mode:

Analysis: Locate the last successful checkpoint in the logs.
Redo: Starting from that checkpoint, read the WAL and "replay" every committed transaction. This brings the data in RAM back to its exact state second before the crash.
Undo: Identify transactions that were in-progress during the crash but never committed, and roll back any partial changes they might have made.

Summary: The Engine of Trust

The Write-Ahead Log is what makes a database feel like a reliable mathematical truth rather than just a fragile piece of software. By separating the intent to change data from the physical reorganization of data, engineers built systems that are both high-performance and virtually uncrashable.

How Write-Ahead Logging Works

Transaction Start

Key Takeaways

Sequential I/O Wins

Durability First

Crash Recovery

Write-Ahead Logging (WAL): The Engineering of Durability

Part 1: The Durability Paradox

Random I/O (The Slow Path)

Sequential I/O (The WAL Path)

Part 2: The Golden Rule - "Write Ahead"

Log Sequence Numbers (LSN)

Part 3: The fsync() Bottleneck

Part 4: Checkpointing - Turning Logs into Data

Why not checkpoint constantly?

Part 5: Crash Recovery - The "Redo" Cycle

Summary: The Engine of Trust

Glossary & Concepts

💾 fsync()

🔢 LSN (Log Sequence Number)

🏁 Checkpoint

🚀 Group Commit

Related Resources

Transactions

Database Indexing

Distributed IDs

Storage Engine Simulator