Network Protocols & DNS

The connectivity layer of distributed systems: DNS lookups, transport decisions, and HTTP request flow.

Module 2: Network Protocols & DNS
Track 1: Foundations (0–2 YoE)
Before caching, sharding, and orchestration, every system depends on one simple promise: users can reach your service. That promise is powered by DNS, transport protocols, and HTTP semantics. A deep understanding of these layers separates engineers who debug outages in minutes from those who stare at dashboards for hours.
The Request Journey

When a user types https://toolkit.whysonil.dev into their browser and presses Enter, an intricate sequence of network operations executes before even a single byte of your application code runs. Understanding this sequence is foundational to system design because every layer introduces latency, failure modes, and optimization opportunities.

The full lifecycle of a cold request — one where nothing is cached and no persistent connection exists — involves five distinct phases. Each phase operates at a different layer of the network stack and has its own set of tradeoffs.

1

DNS Resolution

Domain → IP address

2

TCP Handshake

SYN → SYN-ACK → ACK

3

TLS Handshake

Encrypt the channel

4

HTTP Request

Send GET / POST

5

Response

Data arrives, render

On a cold request from a user in Mumbai to a server in Virginia, the cumulative overhead of steps 1–3 alone can exceed 300ms before a single application byte is exchanged. This is why CDNs, edge termination, connection pooling, and keep-alive headers exist — they eliminate or amortize these costs. In a system design interview, understanding this budget is what lets you reason about where latency is hiding.


DNS Resolution Deep Dive

DNS (Domain Name System) is a globally distributed, hierarchical database that maps human-readable domain names to machine-readable IP addresses. It is arguably the most critical piece of internet infrastructure — if DNS fails, nothing works, regardless of how many nines your application has.

The DNS Hierarchy

DNS resolution follows a strict hierarchy. When your browser needs to resolve toolkit.whysonil.dev, the query cascades through multiple tiers if no cache holds the answer:

Root Nameservers (.)

There are 13 logical root server clusters (A through M), operated by organizations like ICANN, Verisign, and the US Army. They don't know the IP of your domain — they know which TLD servers to ask next. Root servers are anycast-distributed across 1,500+ physical nodes worldwide.

TLD Nameservers (.dev)

Each Top-Level Domain (.com, .dev, .io) has its own set of authoritative servers. The .dev TLD is managed by Google Registry. TLD servers respond with the NS records for your domain's authoritative nameservers.

Authoritative Nameservers (whysonil.dev)

These are the servers you configure in your domain registrar (e.g., Cloudflare, Route 53, Google Cloud DNS). They hold the actual DNS records — A, AAAA, CNAME, MX, TXT, SRV — and return the final IP address to the resolver.

DNS Record Types

Understanding record types is essential for production operations. Each type serves a specific purpose in the resolution and routing pipeline:

Record Purpose Example
A Maps domain to IPv4 address toolkit.whysonil.dev → 104.21.32.1
AAAA Maps domain to IPv6 address → 2606:4700:3031::6815:2001
CNAME Alias to another domain (requires extra lookup) www → toolkit.whysonil.dev
MX Mail server routing with priority 10 mail.google.com
TXT Arbitrary text (SPF, DKIM, domain verification) v=spf1 include:_spf.google.com
SRV Service discovery (port + host) _sip._tcp 5060 sip.example.com
NS Delegates zone to authoritative nameservers ns1.cloudflare.com

CNAME chains are expensive. Each CNAME requires an additional resolution round. A chain like www → cdn.example.com → d1234.cloudfront.net adds multiple DNS lookups. At scale, prefer A/AAAA records at the apex and limit CNAME depth to one hop.

Recursive vs. Iterative Queries

There are two query modes in DNS analysis. In a recursive query, the client asks a resolver (like 8.8.8.8) to do all the work: the resolver walks the hierarchy (root → TLD → authoritative) and returns the final answer. In an iterative query, the server returns a referral ("go ask this other server"), and the client must follow the chain itself. Browsers always use recursive resolvers; resolvers themselves use iterative queries against the hierarchy.

Try It: DNS Lookup Tool

Query live DNS records for any domain. See A, AAAA, CNAME, MX, TXT, and NS records in real-time.


DNS Caching & TTL Strategies

DNS is, at its core, a distributed cache tree. Every layer in the hierarchy caches responses according to the TTL (Time To Live) value set by the authoritative nameserver. Understanding where a response was cached explains both latency and propagation behavior during deployments, migrations, and failovers.

The Four Cache Layers

  • Browser cache: Chrome, Firefox, and Safari cache DNS responses for up to 60 seconds regardless of TTL. This is the fastest layer but is per-tab and volatile — closing the browser clears it.
  • OS resolver cache: The operating system (systemd-resolved on Linux, mDNSResponder on macOS) maintains a shared cache for all applications. On Linux, you can inspect it with resolvectl statistics.
  • Recursive resolver cache: This is the heavy hitter. ISP resolvers or public resolvers like Google (8.8.8.8) and Cloudflare (1.1.1.1) cache responses for the full TTL duration. A TTL of 300 means the resolver won't query the authoritative server again for 5 minutes, even if you change the record.
  • Authoritative nameserver: The source of truth. When all caches expire, the query reaches here. This is where you configure your records and TTLs.

TTL Tradeoffs in Production

Low TTL (30–60s)

  • Fast failover during incidents
  • Quick propagation for DNS changes
  • Higher query volume to authoritative servers
  • Slightly higher latency (more cache misses)

Best for: Active migrations, blue-green deployments

High TTL (3600s+)

  • Lower resolver load and DNS cost
  • Better latency (more cache hits)
  • Slow failover — stale IPs served for hours
  • DNS changes take time to propagate

Best for: Stable production with rare changes

Production pattern: Before a migration, lower your TTL to 60s 24 hours in advance. This ensures all resolvers have flushed the old high-TTL records. After migration, verify traffic has shifted, then raise TTL back to 3600s. This "TTL lowering dance" is standard practice for zero-downtime DNS migrations.

Negative Caching

DNS also caches failures. If a domain doesn't exist (NXDOMAIN), resolvers cache that negative response for the duration of the SOA record's minimum field. This means if you create a new subdomain and someone queries it before records propagate, the "doesn't exist" response might be cached for minutes or hours. This is a common gotcha during rapid subdomain provisioning.


TCP: The Reliable Transport

TCP (Transmission Control Protocol) is the backbone of the internet. Every HTTP request, every database query, every SSH session rides on top of TCP. It guarantees that bytes arrive in order, without corruption, and without loss — at the cost of connection setup overhead and congestion-sensitive throughput.

The 3-Way Handshake

Before any data can flow, TCP requires a three-message handshake to establish a connection. This is not optional — it's how both sides agree on initial sequence numbers, window sizes, and TCP options like Selective Acknowledgment (SACK).

Client → Server: SYN (seq=1000)
Server → Client: SYN-ACK (seq=5000, ack=1001)
Client → Server: ACK (ack=5001)
// Connection established. Data can now flow.

This handshake costs one full round trip (1 RTT). For a client in Singapore connecting to a server in Frankfurt, that's approximately 160ms before any application data is sent. This is why connection reuse (Connection: keep-alive) and connection pooling are so critical in microservice architectures — you amortize this cost across many requests.

Flow Control: The Sliding Window

TCP uses a sliding window mechanism to prevent a fast sender from overwhelming a slow receiver. The receiver advertises its rwnd (receive window) — the amount of data it can buffer. The sender must never have more unacknowledged data in flight than this window allows. This is receiver-side flow control and protects against buffer overflow.

Congestion Control

While flow control protects the receiver, congestion control protects the network. TCP doesn't know the capacity of the network path, so it probes for bandwidth using algorithms that have evolved over decades:

Slow Start

Start with a small congestion window (typically 10 segments) and double it every RTT. This exponential growth quickly finds available bandwidth but can overshoot on lossy links.

AIMD (Additive Increase, Multiplicative Decrease)

After slow start, TCP enters congestion avoidance: increase the window by 1 segment per RTT (additive), but on packet loss, halve the window (multiplicative). This "sawtooth" pattern is the classic Reno behavior.

CUBIC

The default algorithm on Linux since 2006. Uses a cubic function to probe bandwidth more aggressively than Reno, especially on high-bandwidth, high-latency (long fat) links. This is what your servers are running right now.

Connection Teardown

TCP connections close with a four-way handshake (FIN → ACK → FIN → ACK). The initiator enters a TIME_WAIT state for 2×MSL (typically 60s) to handle delayed packets. Under high connection churn (e.g., a load balancer proxying thousands of short-lived requests), TIME_WAIT accumulation can exhaust ephemeral ports. Solutions include SO_REUSEADDR, tcp_tw_reuse, and — most importantly — connection pooling.

Try It: TCP Simulators

Visualize the 3-way handshake step by step, and experiment with congestion control algorithms (Reno, CUBIC) in real time.


UDP: The Fast Path

UDP (User Datagram Protocol) is TCP's simpler sibling. It provides no connection setup, no ordering guarantees, no retransmission, and no congestion control. Each datagram is a standalone, fire-and-forget message. This makes it unsuitable for most web traffic but ideal for specific use cases where speed matters more than reliability.

Where UDP Wins

  • DNS queries: Most DNS lookups use UDP because the query and response each fit in a single packet. There's no need for TCP's connection overhead for a 50-byte request. However, DNS falls back to TCP for responses larger than 512 bytes (or 4096 with EDNS).
  • Real-time media: Video conferencing (Zoom, WebRTC), online gaming, and live streaming use UDP because a dropped frame is better than a delayed one. Retransmitting a lost video frame 200ms later is useless — the moment has passed.
  • QUIC / HTTP/3: Google's QUIC protocol builds reliability on top of UDP, bypassing the kernel's TCP stack entirely. This eliminates head-of-line blocking and enables 0-RTT connection resumption. HTTP/3 is QUIC-based and is already used by Google, Facebook, and Cloudflare.
Dimension TCP UDP
Delivery Reliable, ordered stream Best-effort datagrams
Connection setup 3-way handshake (1 RTT) None (0 RTT)
Head-of-line blocking Yes — one lost packet stalls stream No — independent datagrams
Common use HTTP, databases, SSH, email DNS, video, gaming, QUIC
Congestion control Built-in (Reno, CUBIC, BBR) None (app must implement)

TLS & The HTTPS Handshake

After the TCP connection is established, HTTPS requires a second handshake to negotiate encryption. TLS (Transport Layer Security) creates a secure, encrypted channel between client and server. Without it, any router, ISP, or Wi-Fi snooper between the client and server can read the traffic in plaintext.

TLS 1.3: The Modern Standard

TLS 1.3 (finalized in 2018) dramatically simplified the handshake compared to TLS 1.2. The key improvements:

  • 1-RTT handshake: The client sends its key share in the first message, allowing encryption to begin after just one round trip (vs. 2 RTTs in TLS 1.2).
  • 0-RTT resumption: For repeat visits, TLS 1.3 can resume a previous session and send encrypted data immediately (0 RTTs). This is a game-changer for latency-sensitive applications, though it has replay attack risks.
  • Removed insecure ciphers: TLS 1.3 eliminates RSA key exchange, RC4, SHA-1, and CBC mode entirely. Only AEAD ciphers (AES-GCM, ChaCha20-Poly1305) are allowed.

The Certificate Chain

When the server responds during the TLS handshake, it sends its certificate chain: the server's own certificate, any intermediate certificates, and implicitly, the root Certificate Authority (CA). The client validates this chain against its local trust store. If any link is broken — expired certificate, wrong domain, missing intermediate — the connection fails with a certificate error.

Common outage cause: Forgetting to renew certificates. Let's Encrypt certificates expire every 90 days. Automated renewal via certbot or cloud-managed certificates (ACM, GCP Managed SSL) is non-negotiable in production.

SNI and mTLS

Server Name Indication (SNI) allows a single IP address to serve multiple HTTPS domains. The client includes the requested hostname in the TLS ClientHello message, letting the server select the correct certificate. Without SNI, you'd need one IP per domain — impractical at cloud scale.

Mutual TLS (mTLS) extends the handshake so the server also validates a certificate from the client. This is the standard for service-to-service authentication in zero-trust networks and service meshes like Istio and Linkerd.

Explore: How HTTPS Works

Step through the TLS handshake visually. See how certificates are validated and how encryption is negotiated.

How HTTPS Works

HTTP Semantics at Scale

HTTP (HyperText Transfer Protocol) is the application-layer protocol that defines how clients and servers communicate. While the transport layer (TCP/QUIC) handles reliable byte delivery, HTTP defines the meaning of those bytes: methods, headers, status codes, and content negotiation.

Methods That Matter

In system design, the choice of HTTP method carries semantic meaning that affects caching, idempotency, and retry safety:

  • GET — Read-only, cacheable, safe to retry. Load balancers can cache GET responses. Must be idempotent.
  • POST — Creates resources. Not idempotent by default (calling it twice may create two resources). Never cached.
  • PUT — Full replacement of a resource. Idempotent: calling it twice with the same payload produces the same result.
  • PATCH — Partial update. Not inherently idempotent. Often used with JSON Patch or JSON Merge Patch.
  • DELETE — Removes a resource. Idempotent: deleting the same resource twice should return 200 or 404, not an error.

Status Codes You Must Know

Code Meaning System Design Significance
200 OK Standard success. Cacheable for GET.
201 Created Resource created. Return Location header.
301/302 Redirect 301 is permanent (cached), 302 is temporary.
429 Too Many Requests Rate limiting. Include Retry-After header.
502 Bad Gateway Upstream server unreachable. LB can't connect to backend.
503 Service Unavailable Server overloaded. Trigger circuit breakers.
504 Gateway Timeout Upstream too slow. Tune LB timeout settings.

Headers That Control Performance

A few critical headers shape how the entire network stack behaves:

  • Connection: keep-alive — Reuse TCP connections across multiple requests. Default in HTTP/1.1. Without it, every request pays the 3-way handshake + TLS cost again.
  • Cache-Control — Tells browsers and CDNs whether to cache a response and for how long. max-age=3600 caches for 1 hour. no-store prevents all caching.
  • Content-Encoding: gzip/br — Compress response bodies. Brotli (br) achieves 15–25% better compression than gzip for text.
  • Retry-After — Tells clients when to retry after a 429 or 503. Critical for graceful degradation under load.

HTTP/2 and HTTP/3

HTTP/2 (2015) introduced multiplexing: multiple requests share a single TCP connection via "streams." This eliminates HTTP/1.1's head-of-line blocking at the application layer. It also introduced header compression (HPACK) and server push. However, it still suffers from TCP-level head-of-line blocking — if one TCP packet is lost, all streams stall.

HTTP/3 (2022) solves this by replacing TCP with QUIC (built on UDP). Each QUIC stream is independently loss-recovered, so a lost packet in one stream doesn't affect others. HTTP/3 also enables 0-RTT connection resumption. Major CDNs (Cloudflare, Akamai, Fastly) and browsers already support it.


The Latency Budget

At scale, latency is additive. A cold request accumulates delay at each layer. Understanding this budget is what separates "why is this slow?" guesswork from precise diagnosis.

// Cold request from Mumbai → Virginia
DNS lookup .............. ~50ms (cached) to ~200ms (cold)
TCP handshake ........... ~160ms (1 RTT)
TLS handshake ........... ~160ms (1 RTT for TLS 1.3)
HTTP request + queue .... ~10ms
App processing .......... ~50ms
Upstream calls (DB, etc). ~100ms
Total ≈ 530–730ms for first byte

How to Trim This Budget

  • Connection reuse: Keep-alive eliminates TCP + TLS handshake on subsequent requests. This alone saves ~320ms per request on cross-continental links.
  • Edge termination: Terminate TLS and serve cached content at CDN PoPs close to the user. The user connects to a nearby edge (20ms RTT instead of 160ms).
  • DNS prefetching: Browsers support <link rel="dns-prefetch"> to resolve domains before the user clicks a link.
  • HTTP/2 multiplexing: Send multiple requests over one connection, eliminating per-request connection overhead.
  • Preconnect hints: <link rel="preconnect"> triggers DNS + TCP + TLS ahead of time for critical third-party domains (fonts, analytics).

Production Debugging Toolkit

When something breaks in production, you need to quickly isolate which layer is failing. Here is the systematic debugging sequence every engineer should know:

1. Can I resolve the domain?

dig toolkit.whysonil.dev +short
nslookup toolkit.whysonil.dev 8.8.8.8

Check A/AAAA records, TTL values, and whether the resolver returns the expected IP. If resolution fails, the problem is DNS — check your registrar and nameserver configuration.

2. Can I connect to the port?

curl -v --connect-timeout 5 https://toolkit.whysonil.dev
nc -zv 104.21.32.1 443

If DNS resolves but TCP connect fails, check security groups, NACLs, firewall rules, and whether the service is actually listening on the expected port.

3. Is TLS working?

openssl s_client -connect toolkit.whysonil.dev:443 -servername toolkit.whysonil.dev

Verify certificate expiry, chain completeness, and SNI configuration. A missing intermediate certificate is one of the most common production issues.

4. Is routing correct?

curl -H "Host: toolkit.whysonil.dev" http://LOAD_BALANCER_IP/
traceroute toolkit.whysonil.dev

Confirm the load balancer target group, path-based routing rules, and that traffic reaches the correct backend service.


Lessons from the Trenches

Case Study: The GitHub DNS Outage (2020)

In 2020, GitHub experienced a significant outage caused by a DNS configuration change that propagated unevenly across global resolvers. Some ISP resolvers served stale records for hours because they hadn't expired their cached entries. Users in certain regions could access GitHub normally while others were completely locked out.

Takeaway: Always lower TTLs before planned DNS changes. Test from multiple geographic vantage points (not just your office network). Use DNS monitoring services like DNSCheck or Catchpoint.

Case Study: Cloudflare's 1.1.1.1 and Negative Caching

When Cloudflare launched their 1.1.1.1 DNS resolver, they discovered that many domains had misconfigured SOA records with extremely long minimum TTL values (hours or days). This meant NXDOMAIN (domain not found) responses were cached far longer than intended. New subdomains created by customers appeared to "not exist" for hours, even though the authoritative server had the correct records.

Takeaway: Audit your SOA record's minimum TTL field. Keep it aligned with your operational needs (300s is a reasonable default). Understand that negative caching affects subdomain provisioning speed.

Case Study: TIME_WAIT Exhaustion at Scale

A fintech company running 50,000 short-lived HTTP requests per second from their application servers to a payment gateway noticed intermittent connection failures. Investigation revealed 40,000+ sockets stuck in TIME_WAIT state, exhausting the ephemeral port range (32768–60999 on Linux = ~28,000 ports). Each short-lived TCP connection consumed a port for 60 seconds after close.

Takeaway: Use HTTP connection pooling (keep-alive) for high-throughput inter-service communication. If connection reuse isn't possible, tune net.ipv4.tcp_tw_reuse=1 and expand the ephemeral port range.

Hands-on Learning

Reinforce these concepts with interactive simulators and visual deep-dives.

What's Next?

Caching Strategies

Once request paths are clear, we optimize speed. Next module covers cache-aside, write-through, eviction policies, and consistency patterns.

Continue Journey