Applied Design: File Storage

Build an S3-like distributed file storage system from scratch.

Module 13: Applied Design — File Storage
Track 3: Global Scale (6+ YoE)
Let's apply the 4-Step Framework to a real-world problem: designing a distributed file storage system (like AWS S3, Google Cloud Storage, or Azure Blob Storage). This is one of the most common system design interview questions, and it touches nearly every concept from previous modules.
Step 1: Requirements

Functional Requirements

  • Upload files (1 byte to 5 TB)
  • Download files by key (bucket + object key)
  • List objects in a bucket with prefix filtering
  • Delete objects
  • Generate pre-signed URLs for temporary access
  • Support versioning (optional)

Non-Functional Requirements

  • Durability: 99.999999999% (11 nines) — like S3
  • Availability: 99.99%
  • Consistency: Read-after-write for new objects, eventual for deletes/overwrites
  • Throughput: 10,000 PUT requests/sec, 50,000 GET requests/sec

Step 2: Estimation
// Storage
500M objects, avg size 1 MB → 500 TB raw data
With 3x replication → 1.5 PB storage
// Metadata
500M objects × 1 KB metadata = 500 GB
// → Fits in a single large PostgreSQL instance (or sharded)
// Bandwidth
50,000 reads/sec × 1 MB avg = 50 GB/sec read bandwidth
// → Need distributed storage across many disk servers

Step 3: High-Level Design

The system has three core components:

API Gateway

Handles authentication, rate limiting, request routing. Routes PUT/GET/DELETE to the appropriate service. Validates bucket permissions. Generates pre-signed URLs.

Metadata Service

Stores object metadata: bucket, key, size, content-type, creation time, and the physical locations of the object's data chunks. Backed by a sharded database.

Data Service

Stores the actual binary data on disk servers. Handles replication, erasure coding, and data integrity (checksums). Data is chunked (e.g., 64 MB chunks) and distributed.

Upload Flow

1. Client → API Gateway: PUT /bucket/photo.jpg (with auth)
2. Gateway validates auth, checks bucket exists and permissions
3. Gateway → Metadata Service: allocate object, get placement decision
4. Metadata Service selects 3 data nodes (based on rack/zone diversity)
5. Client streams data → Data Node 1 (primary)
6. Data Node 1 replicates to Data Nodes 2 and 3 (chain replication)
7. All 3 nodes confirm → Metadata Service marks object as COMMITTED
8. Gateway → Client: 200 OK (object is durable)

Step 4: Deep Dives

Durability: 11 Nines

S3 promises 99.999999999% durability — meaning if you store 10 million objects, you expect to lose one every 10,000 years. How?

  • Replication: 3 copies across 3 availability zones. Two zones can fail completely and your data survives.
  • Erasure Coding: Instead of 3 full copies (3x storage overhead), split data into k data chunks + m parity chunks. Any k-of-(k+m) chunks can reconstruct the original. Example: Reed-Solomon 6+3 uses 1.5x storage but tolerates 3 simultaneous disk failures.
  • Checksums: Every chunk has a SHA-256 checksum. On read, the checksum is verified. If corruption is detected, the chunk is re-replicated from a healthy copy.
  • Background repair: A continuous process (anti-entropy) scans for under-replicated or corrupt chunks and repairs them automatically.

Large File Uploads

A 5 TB file can't be uploaded in a single HTTP request — it would take hours and any network hiccup would require restarting from zero. Multipart upload:

1. POST /uploads → { upload_id: "abc123" }
2. PUT /uploads/abc123?partNumber=1 → upload 64MB chunk
3. PUT /uploads/abc123?partNumber=2 → upload 64MB chunk
4. ... (upload chunks in parallel for speed)
5. POST /uploads/abc123/complete → assemble all chunks
// Each chunk is individually checksummed and replicated
// Failed chunks can be retried without re-uploading others

Pre-Signed URLs

Allow temporary, authenticated access without sharing credentials:

// Server generates a time-limited, signed URL
url = sign(bucket + key + expiry + permissions, SECRET_KEY)
// → https://storage.example.com/bucket/photo.jpg
// ?X-Expires=1709251200&X-Signature=a3f8b2...
// Anyone with this URL can download for 1 hour
// After expiry, the signature is invalid

Explore Related Topics

Dig into the building blocks: consistent hashing, replication, erasure coding.


Further Reading

Next Design

Design: Social Feed

Fan-out strategies, timeline assembly, and real-time delivery at scale.

Continue Journey