How Load Balancing Works

Scale your application by distributing traffic internally. Learn how algorithms decide where each request goes.

Clients
Load Balancer
round robin
Server 1
20% Load
Server 2
40% Load
Server 3
30% Load
1 / 5

Traffic Ingress

Hitting the VIP

Clients send requests to a single Virtual IP (VIP) address managed by the Load Balancer.

Technical Detail

DNS resolves example.com to the Load Balancer's Public IP.

Key Takeaways

Scalability

Horizontal scaling (adding more cheap servers) is usually better than vertical scaling (one huge server).

Availability

If one server crashes, the Load Balancer instantly detects it and sends traffic to the healthy ones.

Efficiency

Algorithms like Least Connections ensure no server sits idle while others toil.

The Engineering of Load Balancing: Distributing the Internet

As a web application grows from hundreds of users to millions, a single server (no matter how powerful) will inevitably fail under the sheer volume of concurrent connections. The solution is horizontal scaling—adding more servers. However, this creates a routing problem: How do clients know which server to talk to? The answer is the Load Balancer, the invisible traffic cop of the internet.


Part 1: Layer 4 vs Layer 7 Load Balancing

Load balancers operate at different layers of the OSI model, trading off between raw speed and routing intelligence.

  • Layer 4 (Transport/Network Layer): Operates on raw IP addresses and TCP/UDP ports. An L4 balancer (like AWS Network Load Balancer or HAProxy in TCP mode) does not look at the contents of the message. It simply grabs the TCP packet, performs Network Address Translation (NAT) to change the destination IP to a backend server, and forwards it. This is blisteringly fast and can handle millions of requests per second with virtually zero CPU overhead, but it is entirely blind to HTTP headers or URL paths.
  • Layer 7 (Application Layer): Operates on HTTP/HTTPS. An L7 balancer (like NGINX or AWS Application Load Balancer) fully terminates the TCP connection and decrypts the SSL/TLS certificate. It reads the actual HTTP request (e.g., GET /api/users vs GET /images/logo.png). Because it understands the application data, an L7 balancer can route API requests to a Node.js microservice cluster, while routing image requests to an AWS S3 bucket. This requires significantly more CPU power but enables intelligent, path-based routing.

Part 2: The Routing Algorithms

When a request arrives, the Load Balancer must execute an algorithm to select exactly one backend server from the available pool.

  1. Round Robin: The simplest approach. It literally cycles through the list of servers sequentially (1, 2, 3, 1, 2, 3). It assumes all servers have equal capacity and all requests take an equal amount of time. In the real world, this is rarely true, often leading to uneven load distribution.
  2. Least Connections: The load balancer mathematically tracks exactly how many active TCP connections are open to each backend server. When a new request arrives, it is routed to the server with the absolute fewest active connections. This is vastly superior for workloads where some requests take 5 milliseconds and others take 5 seconds.
  3. IP Hash / Sticky Sessions: The balancer takes the mathematical hash of the Client's IP address (e.g., Hash(192.168.1.50) % 3). This guarantees that a specific user is always routed to the exact same backend server. This is critical for legacy applications that store shopping cart data in local server RAM rather than a shared Redis cache.

Part 3: Health Checks and Self-Healing

A Load Balancer is useless if it blindly routes traffic to a server that has crashed or frozen. To prevent this, active Health Checks are strictly enforced.

Every few seconds, the Load Balancer pings a specific endpoint on every backend server (e.g., GET /health). If a server fails to respond with an HTTP 200 OK within the timeout period (e.g., 2 seconds) for a consecutive number of tries (the Unhealthy Threshold), the Load Balancer instantly ejects that server from the routing pool.

Any active connections to the dead server are immediately severed, and all new traffic is instantly redistributed among the surviving, healthy servers. When the dead server reboots and begins passing health checks again, it is seamlessly reintegrated into the pool. This provides the foundation for massive high-availability infrastructure.

Part 4: SSL Termination and Offloading

Decrypting HTTPS traffic requires heavy CPU cryptographic computation (RSA/ECDSA math). If you have 50 backend web servers, forcing each server to decrypt its own traffic burns massive amounts of CPU that should be used for application logic.

Instead, modern architectures use SSL Offloading. The public TLS Certificate is installed directly on the Load Balancer. The LB accepts the encrypted HTTPS traffic from the public internet, decrypts it using hardware-accelerated chips, and then forwards the plain, unencrypted HTTP traffic to the backend servers over a private, secure VPC network. This dramatically accelerates backend performance.

Conclusion: The Single Point of Contact

Load Balancing abstracts away the physical reality of the datacenter. To the user, Google appears to be a single, infinitely powerful computer sitting at google.com. In reality, the Load Balancer is acting as the ultimate facade—intercepting the request, evaluating server health, executing routing algorithms, terminating encryption, and silently distributing the load across millions of individual microchips in real-time.

Network Load Balancing (Layer 4) vs. Application Load Balancing (Layer 7)

NLB Layer 4 (Transport)

Routes traffic based solely on network and transport layer data: IP address and TCP/UDP ports. It doesn't look at the packet contents.

  • Pros: Extremely fast (millions of req/sec), low CPU usage, good for non-HTTP traffic (e.g., game servers, databases).
  • Cons: "Dumb" routing. Cannot route `/images` to Server A and `/api` to Server B.
Tools: AWS Network Load Balancer, HAProxy (TCP), LVS

ALB Layer 7 (Application)

Understands HTTP/HTTPS. It terminates the connection, reads the headers, cookies, and URL, makes a decision, and opens a new connection to the backend.

  • Pros: Intelligent routing (path-based, header-based), can terminate SSL, can inject HTTP headers.
  • Cons: Slower than L4, higher CPU overhead for decryption/parsing.
Tools: AWS Application Load Balancer, NGINX, Envoy

Glossary & Concepts

🌐 VIP (Virtual IP)

An IP address exposed to clients that doesn't correspond to a specific physical server, but rather to the load balancer itself. The LB then routes traffic from the VIP to the backend servers.

🏥 Health Check

A periodic test performed by the load balancer to ensure backend servers are alive and capable of serving traffic. Typical methods include pinging an IP or making an HTTP GET request to a `/health` endpoint.

⚖️ Round Robin

A simple distribution algorithm that routes requests to servers in a cyclical order (Server 1, then Server 2, then Server 3, back to Server 1).

🧲 Sticky Sessions (Session Affinity)

A mechanism to route all requests from a single client session to the same backend server, often implemented via IP hashing or cookies. Often needed by older stateful applications.