The 4-Step Design Framework
A repeatable mental model for tackling any system design problem.
The biggest failure mode: designing the wrong system. Before drawing any boxes, spend 5 minutes clarifying exactly what you're building and for whom.
Functional Requirements
What does the system do? These are the user-facing features. Be specific:
- ❌ "Design a URL shortener" → too vague
- ✅ "Users can create short URLs. Short URLs redirect to the original. Users can see click analytics. URLs expire after 30 days by default. Custom aliases are supported."
Ask clarifying questions: Can users delete URLs? Is authentication required? Are there rate limits? What's the max URL length?
Non-Functional Requirements (NFRs)
How should it behave? These are the quality attributes:
Availability
99.9%? 99.99%? 99.9% = 8.76 hours downtime/year. 99.99% = 52 minutes/year. Big difference in architecture.
Latency
P99 latency target? URL redirect should be <50ms. Analytics dashboard can tolerate 2-3 seconds.
Consistency
Strong or eventual? URL creation must be consistent (no duplicates). Analytics can be eventually consistent.
Scale
How many users? Requests/sec? Data volume? This feeds directly into Step 2.
Estimations ground your design in reality. They tell you whether you need 1 server or 1,000, and whether your data fits on SSD or requires distributed storage.
Key Numbers Every Engineer Should Know
| Operation | Latency |
|---|---|
| L1 cache reference | 0.5 ns |
| L2 cache reference | 7 ns |
| Main memory reference | 100 ns |
| SSD random read | 150 μs |
| HDD random read | 10 ms |
| Same-datacenter round trip | 0.5 ms |
| Cross-continent round trip | 150 ms |
Estimation Example: URL Shortener
Now draw the architecture. Start with the happy path for the most critical use case, then expand.
The Universal Building Blocks
- Clients: Web, mobile, API consumers
- Load Balancer / API Gateway: Entry point, TLS termination, rate limiting, routing
- Application Servers: Stateless services behind LB
- Cache: Redis/Memcached for hot reads
- Database: PostgreSQL (OLTP), Cassandra (write-heavy), etc.
- Message Queue: Kafka/SQS for async processing
- Object Storage: S3 for files, images, videos
- CDN: Cloudflare/CloudFront for static assets and edge caching
API Design First
Before drawing boxes, define the API contract. For a URL shortener: POST /api/shorten {"url": "...", "ttl": 3600} → {"short_url":
"..."}
and GET /{short_code} → 302 Redirect. The API reveals the data
flow, which dictates the architecture.
Pick 2-3 components and go deep. This is where you demonstrate technical depth. Common deep-dive areas:
Data Model & Schema
Table schemas, partition keys, indexes. How do you handle queries efficiently? What denormalization is needed?
Scaling Bottlenecks
What breaks at 10x traffic? Database sharding strategy? Hot partition handling? Cache stampede prevention?
Failure Modes
What happens if the database is down? If a datacenter fails? What's the blast radius? How do you detect and recover?
Consistency & Concurrency
Race conditions? Distributed transactions? How do you ensure exactly-once processing? What consistency model?
- Over-engineering: Designing for 1B users when you have 1,000. Start simple, add complexity only when justified by estimation.
- Buzzword architecture: Adding Kafka, Redis, Elasticsearch to every design without justification. Each component adds operational cost.
- Ignoring failure: A design without failure modes is a design that will fail catastrophically. What happens when X dies?
- Monologue mode: In interviews, driving the design without checking in. In real life, designing without stakeholder alignment.
- Jumping to solutions: Drawing boxes before understanding requirements. The most common mistake.
- System Design Interview by Alex Xu — Step-by-step walkthroughs of 15+ real-world designs. (ByteByteGo, 2020)
- Designing Data-Intensive Applications by Martin Kleppmann — The engineering reference behind every design decision. (O'Reilly, 2017)
- The System Design Primer — GitHub — Open-source collection of scalability topics and design examples.
- High Scalability — Case studies of how real companies scale their architectures.
Apply It
Design: File Storage
Put the framework into practice. Design an S3-like distributed file storage system from scratch.
Continue Journey