The Engineering of Realtime Web: Breaking HTTP's Request-Response Limitations
The web was fundamentally architected as a document-fetching system: the client asks, the server answers, and the transaction is closed immediately to save RAM. But modern applications like trading dashboards, collaborative text editors, and live multiplayer games require the exact opposite: stateful, long-lived pipes where the server can push data unprompted. Engineering this on top of a stateless protocol like HTTP required three generations of clever hacks and structural rewrites.
Generation 1: Long Polling (The Clever Hack)
Standard polling is brutal on server resources. If an app makes an AJAX request every 1 second, and the user stays on the page for an hour, that is 3,600 entirely wasted HTTP handshakes just to find out "no new messages".
Long Polling solved the empty-response problem, but introduced a severe concurrency problem. In Long Polling, the client asks for data, and if there is none, the Server literally hangs the HTTP request in a "Pending" state for up to 60 seconds.
In older architectures (like blocking Java or Apache servers), every hung request consumed one dedicated OS thread and roughly 2MB of RAM. If you had 10,000 users sitting idly waiting for a chat message, your server aggressively consumed 20GB of RAM just doing absolutely nothing. This structural failure directly catalyzed the invention of strictly asynchronous, event-driven servers like Node.js (which can hold tens of thousands of idle connections on a single thread).
Generation 2: Server-Sent Events (SSE) (The HTTP/2 Champion)
What if the server just never sent the HTTP "End of Message" signal? This is the core principle of Server-Sent Events (SSE).
The client opens an HTTP connection with the header Accept: text/event-stream. The Server responds with HTTP 200 OK, but importantly sets
Transfer-Encoding: chunked (in HTTP/1.1), meaning "I will keep trickling data
down this pipe indefinitely."
SSE is historically underrated because it is strictly unidirectional (Server to Client only). However, with the widespread adoption of HTTP/2, SSE has experienced a massive resurgence. Because HTTP/2 multiplexes multiple streams over a single TCP connection, opening an SSE stream no longer monopolizes one of the browser's 6 connection limits. For applications like Live Sports Scores or AI streaming (like ChatGPT's typing effect), SSE is often vastly superior to WebSockets because it handles auto-reconnection and message-retry automatically, straight out of the box.
Generation 3: WebSockets (The TCP Hijack)
When latency is measured in milliseconds, and the client must barrage the server with data (e.g., streaming a player's cursor coordinates 60 times a second in a multiplayer game), the overhead of sending HTTP headers with every single tiny message is mathematically crippling.
The WebSocket protocol solves this by engaging in a brilliant bait-and-switch. It starts
as a perfectly normal HTTP GET request to bypass restrictive corporate firewalls. It
includes a specific header: Upgrade: websocket.
If the server supports it, it responds with an HTTP 101 Switching Protocols status code. The moment this happens, HTTP is completely abandoned. The underlying raw, naked TCP socket is hijacked by the WebSocket protocol.
From that millisecond onward, data is sent back and forth in hyper-efficient binary or text "Frames". The framing overhead per message drops from roughly 800 bytes (standard HTTP headers) down to an astonishing 2 to 14 bytes. This allows for massive, sustained, low-latency bi-directional throughput that fundamentally enables the modern interactive web.
The Hidden Cost: Load Balancing Stateful Connections
Whether you use WebSockets or Long Polling, you are creating Stateful connections. This completely breaks traditional Round-Robin load balancing.
If a client establishes a WebSocket connection to Server A, but an HTTP POST (triggering a chat message) goes to Server B, Server B cannot push the message down the socket because Server A physically owns the TCP port.
To engineer at scale, stateful systems require a Pub/Sub Backplane (most commonly Redis). Server B must publish the chat message to Redis, Server A subscribes to Redis, realizes its connected client needs the message, and finally pushes it down the WebSocket. This dramatically increases architectural complexity.