Applied Design: Real-time Chat

Module 15: Applied Design — Real-time Chat

Track 3: Global Scale (6+ YoE)

Chat is deceptively complex. Users expect messages to appear instantly (under 100ms), be delivered exactly once, preserve order, and work even when offline. This module walks through the design of a WhatsApp/Slack-scale chat system, applying every concept from the handbook.

Step 1: Requirements

Functional Requirements

1:1 real-time messaging
Group chats (up to 500 members)
Sent/delivered/read receipts
Online/offline presence indicators
Image/file sharing
Offline message sync (deliver when user reconnects)
Message history (scrollback)

Non-Functional Requirements

Latency: <100ms message delivery (within same region)
Scale: 100M concurrent WebSocket connections
Ordering: Messages within a conversation must be ordered
Durability: No message loss, even during server failures
Availability: 99.99%

Step 2: Estimation

// Connection capacity

100M concurrent WebSocket connections

1 server handles ~50K connections → 2,000 chat servers needed

// Message throughput

500M DAU × 40 messages/day = 20B messages/day

→ ~230K messages/sec

// Storage (1 year)

20B messages/day × 365 = 7.3T messages/year

7.3T × 100 bytes avg = 730 TB/year

// → Need distributed storage (Cassandra or similar)

// Bandwidth

230K msgs/sec × 100 bytes = 23 MB/sec (text only)

// Modest — media files are the bandwidth hog

Step 3: High-Level Design

Chat Servers (WebSocket)

Maintain persistent WebSocket connections with clients. Each server holds ~50K connections. Stateful — must know which user is connected to which server for message routing.

Session Registry

Maps user_id → chat_server_id. When Alice sends a message to Bob, we look up which chat server Bob is connected to. Stored in Redis for O(1) lookups. Updated on connect/disconnect.

Message Queue (Kafka)

Decouples message ingestion from delivery. Messages are persisted in Kafka, then consumed by the recipient's chat server. If the recipient is offline, messages wait in the queue.

Message Storage (Cassandra)

Persistent storage for message history. Partitioned by (chat_id), sorted by message timestamp. Enables efficient "load last 50 messages" and scrollback queries.

1:1 Message Flow

1. Alice → WebSocket → Chat Server A: send(to: Bob, text: "Hi!")

2. Chat Server A → assign message_id (snowflake/ULID) + timestamp

3. Chat Server A → write message to Cassandra (durable store)

4. Chat Server A → ACK to Alice ("sent" ✓)

5. Chat Server A → lookup Session Registry: Bob → Chat Server B

6. Chat Server A → push message to Chat Server B (via Kafka or direct RPC)

7. Chat Server B → push message over Bob's WebSocket

8. Bob's client → ACK → "delivered" ✓✓

9. Bob reads the message → "read" receipt sent back

Offline Delivery

If Bob is offline at Step 5, the message sits in Kafka / a per-user offline queue. When Bob reconnects:

1. Bob connects → WebSocket established with Chat Server C

2. Chat Server C → update Session Registry: Bob → Server C

3. Chat Server C → fetch undelivered messages from offline queue

4. Chat Server C → push all pending messages to Bob's client

5. Bob's client → ACK each message → "delivered" ✓✓

Step 4: Deep Dives

Message Ordering

Messages must appear in order within a conversation. But in a distributed system, two servers can process messages from Alice and Bob simultaneously. Solutions:

Snowflake IDs: Monotonically increasing, timestamp-based IDs. Messages are sorted by ID. Clock skew (up to a few ms) is acceptable for chat.
Lamport timestamps: Each message includes a logical clock. On receiving a message with timestamp T, the receiver sets its clock to max(local_clock, T) + 1. Guarantees causal ordering.
Per-conversation sequence numbers: Each conversation has a single sequencer (database row with AUTO_INCREMENT or Redis INCR). Guarantees strict ordering but is a bottleneck for very active chats.

Presence System

Showing who's "online" seems simple but is expensive at scale. Naive approach: every client sends a heartbeat every 5 seconds → 100M users × 1 heartbeat/5s = 20M writes/sec — crushing Redis.

Optimization 1: Only track presence for users whose contacts are online. If none of Alice's contacts are online, don't bother tracking Alice's heartbeat.
Optimization 2: Bucket heartbeats. Instead of "last seen at 12:34:56.789," round to "last seen in the last 30 seconds." This reduces update frequency.
Optimization 3: Pub/sub for presence changes. Instead of polling, subscribe to a presence channel. Only push events when status actually changes (online → offline or vice versa).

Group Chat Scaling

Group messages are harder than 1:1 because each message must be delivered to N members:

Small groups (<50): Fan-out on write. When Alice sends a message, write it to every member's delivery queue. Like the social feed push model.
Large groups (50-500): Write message once, tag with group_id. Members poll or subscribe to the group's message stream (Kafka topic per group or multiplexed).
Channels (1000+): Treat like a broadcast. Message is stored once, delivered via pub/sub. No per-member queues.

End-to-End Encryption

For security-sensitive chat (WhatsApp, Signal), the server should never see plaintext messages:

Each user generates an asymmetric key pair (public + private)
Users exchange public keys via the server (key directory)
Messages are encrypted client-side with the recipient's public key
The Signal Protocol uses a ratchet mechanism: every message uses a unique key derived from the previous key, providing forward secrecy. Compromising one key doesn't compromise past messages.

Lessons from the Trenches

Case Study: WhatsApp — 2B Users, 50 Engineers

At the time of its $19B acquisition by Facebook (2014), WhatsApp had 450M users served by just 50 engineers. Their secret: Erlang's concurrency model. Each Erlang process handles one WebSocket connection, and Erlang's lightweight processes allow millions per node. They used FreeBSD, custom BEAM VM tuning, and Mnesia for session storage. By 2023, WhatsApp serves 2B users with end-to-end encryption on all messages via the Signal Protocol.

Takeaway: Technology choice matters enormously. Erlang's actor model is ideally suited for massively concurrent, always-connected systems. The right runtime can reduce your server count by 10x.

Case Study: Discord — 200M MAU on Elixir + Rust

Discord handles 200M monthly active users with Elixir (BEAM VM, like Erlang) for their real-time gateway and Rust for performance-critical paths (voice, video). Each Discord "guild" is an Elixir process. The Gateway maintains WebSocket connections, routes events, and manages presence. Messages are stored in Cassandra (later ScyllaDB). Their architecture handles millions of concurrent connections per cluster.

Takeaway: The BEAM VM (Erlang/Elixir) continues to dominate real-time messaging infrastructure. Rust fills the gaps where raw performance matters.

Related Simulators & Deep-Dives

Circuit Breaker Consistent Hashing Raft Consensus How WebSockets Work How HTTPS Works Async Architecture

🎉 Handbook Complete

Congratulations!

You've completed the entire System Design Handbook — from binary limits to global-scale applied designs. Revisit any module, use the simulators, and keep building.

Back to Handbook