How WebSockets Work

The Engineering of WebSockets: Hijacking the HTTP Protocol

For the first 15 years of the web, HTTP was a strictly unidirectional protocol. The browser requested data, and the server responded. The server could never initiate a conversation. Engineers built massive, hacky workarounds like Long Polling and Hidden iFrames simply to achieve real-time updates. WebSockets solved this by completely abandoning the HTTP request-response cycle while cleverly using HTTP's own port 80 to do it.

Part 1: The HTTP Upgrade Dance

WebSockets are not a separate port like FTP (21) or SSH (22). They operate exclusively over standard HTTP (80) or HTTPS (443). This was a genius design decision, ensuring WebSocket traffic wouldn't be blocked by strict corporate firewalls.

The connection starts as a perfectly normal HTTP GET request with two magical headers: Connection: Upgrade and Upgrade: websocket.

If the server supports WebSockets, it replies with HTTP 101 Switching Protocols. At that exact millisecond, both the client and server agree to physically sever all ties to the HTTP protocol. They rip away the HTTP parser and hand the raw, underlying TCP Socket directly to the WebSocket implementation.

Part 2: Framing vs Streaming

Unlike raw TCP, which is a continuous, unstructured stream of bytes, WebSockets are a Message-Based Protocol.

Every discrete chunk of data is wrapped in a "Frame". The framing overhead gives the server the exact length of the incoming message before it reads the payload, preventing Buffer Overflows. Crucially, the overhead is microscopic. An HTTP header is typically 800+ bytes. A WebSocket frame header is a maximum of 2 to 14 bytes.

If you send 10 messages a second via HTTP polling, you waste 8,000 bytes/sec just on redundant headers. With WebSockets, you waste 20 bytes/sec. This efficiency is why multiplayer browser games and high-frequency trading dashboards exclusively use WebSockets.

Part 3: The C10K Problem & Stateful Scaling

Scaling a standard REST API is trivial: you put 50 servers behind a Load Balancer, and any server can handle any request because HTTP is stateless. Scaling WebSockets is an engineering nightmare.

Because the TCP connection is kept permanently open, the Load Balancer cannot round-robin individual messages. A user is physically tethered to "Server A" for the duration of their session. If Server A needs to reboot, the connection is violently broken.

Furthermore, keeping connections open consumes RAM and consumes a Network File Descriptor on the Linux OS. In the early 2000s, servers struggled to hold 10,000 concurrent connections (The C10K Problem). Modern asynchronous event loops (Node.js, Go, Rust's Tokio, Java's Netty) use epoll/kqueue to handle millions of idle WebSockets on a single server, waiting for data without blocking threads.

Part 4: Cross-Site WebSocket Hijacking (CSWSH)

Because the initial WebSocket Handshake is an HTTP GET request, the browser automatically attaches all domain Cookies to it. This creates a massive security vulnerability identical to CSRF.

A malicious website can silently execute new WebSocket('wss://yourbank.com/api') in the background. Your bank's server will see your valid auth cookie, upgrade the connection, and allow the attacker to send money transfers via WebSocket frames.

Unlike HTTP, CORS policies DO NOT apply to WebSockets. To prevent this, your WebSocket server MUST explicitly validate the Origin header during the HTTP Upgrade phase and reject any handshake originating from unauthorized domains.

How WebSockets Work

The Handshake

What Happens

Why

Key Takeaways

It Starts as HTTP

Efficient

Stateful

Related Resources

WebSocket Tester

Realtime Architecture

The Engineering of WebSockets: Hijacking the HTTP Protocol

Part 1: The HTTP Upgrade Dance

Part 2: Framing vs Streaming

Part 3: The C10K Problem & Stateful Scaling

Part 4: Cross-Site WebSocket Hijacking (CSWSH)

Glossary & Concepts

Upgrade Header

Keep-Alive (Ping/Pong)

Frame

Full Duplex