Applied Design: Real-time Chat
WebSockets, presence, message ordering, and offline sync at scale.
Functional Requirements
- 1:1 real-time messaging
- Group chats (up to 500 members)
- Sent/delivered/read receipts
- Online/offline presence indicators
- Image/file sharing
- Offline message sync (deliver when user reconnects)
- Message history (scrollback)
Non-Functional Requirements
- Latency: <100ms message delivery (within same region)
- Scale: 100M concurrent WebSocket connections
- Ordering: Messages within a conversation must be ordered
- Durability: No message loss, even during server failures
- Availability: 99.99%
Chat Servers (WebSocket)
Maintain persistent WebSocket connections with clients. Each server holds ~50K connections. Stateful — must know which user is connected to which server for message routing.
Session Registry
Maps user_id → chat_server_id. When Alice sends a message to Bob, we look up which chat server Bob is connected to. Stored in Redis for O(1) lookups. Updated on connect/disconnect.
Message Queue (Kafka)
Decouples message ingestion from delivery. Messages are persisted in Kafka, then consumed by the recipient's chat server. If the recipient is offline, messages wait in the queue.
Message Storage (Cassandra)
Persistent storage for message history. Partitioned by (chat_id), sorted by message timestamp. Enables efficient "load last 50 messages" and scrollback queries.
1:1 Message Flow
Offline Delivery
If Bob is offline at Step 5, the message sits in Kafka / a per-user offline queue. When Bob reconnects:
Message Ordering
Messages must appear in order within a conversation. But in a distributed system, two servers can process messages from Alice and Bob simultaneously. Solutions:
- Snowflake IDs: Monotonically increasing, timestamp-based IDs. Messages are sorted by ID. Clock skew (up to a few ms) is acceptable for chat.
- Lamport timestamps: Each message includes a logical clock. On receiving a message with timestamp T, the receiver sets its clock to max(local_clock, T) + 1. Guarantees causal ordering.
- Per-conversation sequence numbers: Each conversation has a single sequencer (database row with AUTO_INCREMENT or Redis INCR). Guarantees strict ordering but is a bottleneck for very active chats.
Presence System
Showing who's "online" seems simple but is expensive at scale. Naive approach: every client sends a heartbeat every 5 seconds → 100M users × 1 heartbeat/5s = 20M writes/sec — crushing Redis.
- Optimization 1: Only track presence for users whose contacts are online. If none of Alice's contacts are online, don't bother tracking Alice's heartbeat.
- Optimization 2: Bucket heartbeats. Instead of "last seen at 12:34:56.789," round to "last seen in the last 30 seconds." This reduces update frequency.
- Optimization 3: Pub/sub for presence changes. Instead of polling, subscribe to a presence channel. Only push events when status actually changes (online → offline or vice versa).
Group Chat Scaling
Group messages are harder than 1:1 because each message must be delivered to N members:
- Small groups (<50): Fan-out on write. When Alice sends a message, write it to every member's delivery queue. Like the social feed push model.
- Large groups (50-500): Write message once, tag with group_id. Members poll or subscribe to the group's message stream (Kafka topic per group or multiplexed).
- Channels (1000+): Treat like a broadcast. Message is stored once, delivered via pub/sub. No per-member queues.
End-to-End Encryption
For security-sensitive chat (WhatsApp, Signal), the server should never see plaintext messages:
- Each user generates an asymmetric key pair (public + private)
- Users exchange public keys via the server (key directory)
- Messages are encrypted client-side with the recipient's public key
- The Signal Protocol uses a ratchet mechanism: every message uses a unique key derived from the previous key, providing forward secrecy. Compromising one key doesn't compromise past messages.
Case Study: WhatsApp — 2B Users, 50 Engineers
At the time of its $19B acquisition by Facebook (2014), WhatsApp had 450M users served by just 50 engineers. Their secret: Erlang's concurrency model. Each Erlang process handles one WebSocket connection, and Erlang's lightweight processes allow millions per node. They used FreeBSD, custom BEAM VM tuning, and Mnesia for session storage. By 2023, WhatsApp serves 2B users with end-to-end encryption on all messages via the Signal Protocol.
Takeaway: Technology choice matters enormously. Erlang's actor model is ideally suited for massively concurrent, always-connected systems. The right runtime can reduce your server count by 10x.
Case Study: Discord — 200M MAU on Elixir + Rust
Discord handles 200M monthly active users with Elixir (BEAM VM, like Erlang) for their real-time gateway and Rust for performance-critical paths (voice, video). Each Discord "guild" is an Elixir process. The Gateway maintains WebSocket connections, routes events, and manages presence. Messages are stored in Cassandra (later ScyllaDB). Their architecture handles millions of concurrent connections per cluster.
Takeaway: The BEAM VM (Erlang/Elixir) continues to dominate real-time messaging infrastructure. Rust fills the gaps where raw performance matters.
- Signal Protocol Documentation — How end-to-end encryption works with the Double Ratchet.
- How Discord Stores Billions of Messages — Cassandra data model and migration to ScyllaDB.
- Building Mobile-First Infrastructure for Messenger — Facebook
- System Design Interview Vol. 1 by Alex Xu — Chapter 12 (Design a Chat System). (ByteByteGo, 2020)
- WhatsApp's Erlang Architecture — How Erlang powers billions of connections.
Related Simulators & Deep-Dives
🎉 Handbook Complete
Congratulations!
You've completed the entire System Design Handbook — from binary limits to global-scale applied designs. Revisit any module, use the simulators, and keep building.
Back to Handbook