Applied Design: File Storage
Build an S3-like distributed file storage system from scratch.
Functional Requirements
- Upload files (1 byte to 5 TB)
- Download files by key (bucket + object key)
- List objects in a bucket with prefix filtering
- Delete objects
- Generate pre-signed URLs for temporary access
- Support versioning (optional)
Non-Functional Requirements
- Durability: 99.999999999% (11 nines) — like S3
- Availability: 99.99%
- Consistency: Read-after-write for new objects, eventual for deletes/overwrites
- Throughput: 10,000 PUT requests/sec, 50,000 GET requests/sec
The system has three core components:
API Gateway
Handles authentication, rate limiting, request routing. Routes PUT/GET/DELETE to the appropriate service. Validates bucket permissions. Generates pre-signed URLs.
Metadata Service
Stores object metadata: bucket, key, size, content-type, creation time, and the physical locations of the object's data chunks. Backed by a sharded database.
Data Service
Stores the actual binary data on disk servers. Handles replication, erasure coding, and data integrity (checksums). Data is chunked (e.g., 64 MB chunks) and distributed.
Upload Flow
Durability: 11 Nines
S3 promises 99.999999999% durability — meaning if you store 10 million objects, you expect to lose one every 10,000 years. How?
- Replication: 3 copies across 3 availability zones. Two zones can fail completely and your data survives.
- Erasure Coding: Instead of 3 full copies (3x storage overhead), split data into k data chunks + m parity chunks. Any k-of-(k+m) chunks can reconstruct the original. Example: Reed-Solomon 6+3 uses 1.5x storage but tolerates 3 simultaneous disk failures.
- Checksums: Every chunk has a SHA-256 checksum. On read, the checksum is verified. If corruption is detected, the chunk is re-replicated from a healthy copy.
- Background repair: A continuous process (anti-entropy) scans for under-replicated or corrupt chunks and repairs them automatically.
Large File Uploads
A 5 TB file can't be uploaded in a single HTTP request — it would take hours and any network hiccup would require restarting from zero. Multipart upload:
Pre-Signed URLs
Allow temporary, authenticated access without sharing credentials:
Explore Related Topics
Dig into the building blocks: consistent hashing, replication, erasure coding.
- Building and Operating a Pretty Big Storage System — Andy Warfield (S3)
- GFS: The Google File System (SOSP 2003) — The paper that inspired HDFS and modern blob storage.
- Backblaze Vaults — Reed-Solomon Erasure Coding — How Backblaze uses erasure coding for durable storage.
- Designing Data-Intensive Applications — Chapter 5 (Replication) and Chapter 6 (Partitioning). (O'Reilly, 2017)
Next Design
Design: Social Feed
Fan-out strategies, timeline assembly, and real-time delivery at scale.
Continue Journey