Realtime Communication Protocols

How does the server push data to the client? Compare the three main strategies used in modern web apps.

Step 1

Client Request

Asking for updates

Request ↓

Description

Client sends a standard HTTP GET request to the server, asking for new messages.

Why use it?

The client wants to know if there is any new data available but can't "listen" passively.

Technical Spec

Standard AJAX/Fetch request. The connection is opened.

Example Payload

GET /poll/messages HTTP/1.1

Step 2

Server Holds Connection

The "Long" Poll

Request ↓

Description

The server does NOT respond immediately if there is no data. It deliberately hangs the request.

Why use it?

This simulates a "push". By waiting, the server can reply the moment data arrives.

Technical Spec

Thread sleeps or Request Parked. Timeout set to ~30-60s.

Example Payload

Server status: Pending... (Waiting for Event)

Step 3

Event or Timeout

Trigger condition

Request ↓

Description

Either new data arrives (e.g. chat message) OR the timeout is reached.

Why use it?

Requests can't stay open forever (browser/proxy limits).

Technical Spec

Event Loop triggers response construction.

Example Payload

New Message inserted into DB -> Trigger fires.

Step 4

Server Responds

Closing the cycle

Response ↑

Description

Server writes the response body and closes the connection.

Why use it?

HTTP/1.1 is request-response. To send data, we must complete the response. This "closes" the transaction.

Technical Spec

HTTP 200 OK. Connection: close (or keep-alive, but logical request ends).

Example Payload

{ "messages": [{ "id": 1, "text": "Hi" }], "timestamp": 1678900000 }

Step 5

Client Reconnects

Looping

Request ↓

Description

As soon as the client receives the response, it immediately initiates a NEW request.

Why use it?

To minimize the time window where we aren't listening for updates.

Technical Spec

client.on("finish", () => startPolling())

Example Payload

latency = (Time between Response End and Request Start)

Key Takeaways

Long Polling

The legacy approach. Easy to implement but resource-heavy due to frequent connection re-establishment.

SSE

Best for one-way updates (Server → Client). Built-in reconnection and simple textual format.

WebSockets

True bi-directional standard. Lowest latency but requires a more complex stateful server.

The Engineering of Realtime Web: Breaking HTTP's Request-Response Limitations

The web was fundamentally architected as a document-fetching system: the client asks, the server answers, and the transaction is closed immediately to save RAM. But modern applications like trading dashboards, collaborative text editors, and live multiplayer games require the exact opposite: stateful, long-lived pipes where the server can push data unprompted. Engineering this on top of a stateless protocol like HTTP required three generations of clever hacks and structural rewrites.


Generation 1: Long Polling (The Clever Hack)

Standard polling is brutal on server resources. If an app makes an AJAX request every 1 second, and the user stays on the page for an hour, that is 3,600 entirely wasted HTTP handshakes just to find out "no new messages".

Long Polling solved the empty-response problem, but introduced a severe concurrency problem. In Long Polling, the client asks for data, and if there is none, the Server literally hangs the HTTP request in a "Pending" state for up to 60 seconds.

In older architectures (like blocking Java or Apache servers), every hung request consumed one dedicated OS thread and roughly 2MB of RAM. If you had 10,000 users sitting idly waiting for a chat message, your server aggressively consumed 20GB of RAM just doing absolutely nothing. This structural failure directly catalyzed the invention of strictly asynchronous, event-driven servers like Node.js (which can hold tens of thousands of idle connections on a single thread).

Generation 2: Server-Sent Events (SSE) (The HTTP/2 Champion)

What if the server just never sent the HTTP "End of Message" signal? This is the core principle of Server-Sent Events (SSE).

The client opens an HTTP connection with the header Accept: text/event-stream. The Server responds with HTTP 200 OK, but importantly sets Transfer-Encoding: chunked (in HTTP/1.1), meaning "I will keep trickling data down this pipe indefinitely."

SSE is historically underrated because it is strictly unidirectional (Server to Client only). However, with the widespread adoption of HTTP/2, SSE has experienced a massive resurgence. Because HTTP/2 multiplexes multiple streams over a single TCP connection, opening an SSE stream no longer monopolizes one of the browser's 6 connection limits. For applications like Live Sports Scores or AI streaming (like ChatGPT's typing effect), SSE is often vastly superior to WebSockets because it handles auto-reconnection and message-retry automatically, straight out of the box.

Generation 3: WebSockets (The TCP Hijack)

When latency is measured in milliseconds, and the client must barrage the server with data (e.g., streaming a player's cursor coordinates 60 times a second in a multiplayer game), the overhead of sending HTTP headers with every single tiny message is mathematically crippling.

The WebSocket protocol solves this by engaging in a brilliant bait-and-switch. It starts as a perfectly normal HTTP GET request to bypass restrictive corporate firewalls. It includes a specific header: Upgrade: websocket.

If the server supports it, it responds with an HTTP 101 Switching Protocols status code. The moment this happens, HTTP is completely abandoned. The underlying raw, naked TCP socket is hijacked by the WebSocket protocol.

From that millisecond onward, data is sent back and forth in hyper-efficient binary or text "Frames". The framing overhead per message drops from roughly 800 bytes (standard HTTP headers) down to an astonishing 2 to 14 bytes. This allows for massive, sustained, low-latency bi-directional throughput that fundamentally enables the modern interactive web.

The Hidden Cost: Load Balancing Stateful Connections

Whether you use WebSockets or Long Polling, you are creating Stateful connections. This completely breaks traditional Round-Robin load balancing.

If a client establishes a WebSocket connection to Server A, but an HTTP POST (triggering a chat message) goes to Server B, Server B cannot push the message down the socket because Server A physically owns the TCP port.

To engineer at scale, stateful systems require a Pub/Sub Backplane (most commonly Redis). Server B must publish the chat message to Redis, Server A subscribes to Redis, realizes its connected client needs the message, and finally pushes it down the WebSocket. This dramatically increases architectural complexity.

Glossary & Concepts

WebSocket

Protocol providing full-duplex communications channels over a single TCP connection. Starts as HTTP, upgrades to raw TCP.

Handshake

The initial HTTP request ("Upgrade: websocket") and response (101 Switching Protocols) that establishes the WS connection.

Full-Duplex

Communication where both parties can send and receive data simultaneously. (vs Half-Duplex like HTTP request/response).

TCP

Transmission Control Protocol. The underlying transport layer that ensures varied packets arrive in order and without errors.

Frame

The unit of data in WebSockets. Data is split into frames (Text, Binary, Ping, Pong, Close) to avoid blocking the channel.

Keep-Alive

Periodic "Ping" frames sent to prevent intermediate proxies/load balancers from closing an idle connection due to timeout.