How Memory Allocation Works

Stack vs Heap, virtual memory, page faults, and why malloc is slower than you think.

Memory Layout
Stack (LIFO)
SP →
main() 60B
int x = 42 40B
↑ Grows upward
Heap (Dynamic)
malloc(1KB) 80B
Object A 50B
free() 40B
malloc(2KB) 100B
↓ Grows downward
Stack Frame
Heap Block
Free
1 / 6

Stack Allocation

Fast, automatic memory

What Happens

Local variables and function call frames are allocated on the stack. Memory is automatically reclaimed when the function returns.

Why

Stack allocation is extremely fast (just bump a pointer) and requires no manual management.

Technical Detail

Stack pointer (SP) moves up/down. LIFO order. Fixed size per thread (typically 1-8MB).

Example int x = 42; // Stack allocated, auto-freed on return

Key Takeaways

Stack is Fast

Stack allocation is just a pointer bump. Use it for small, short-lived data.

Virtual Memory

Each process has isolated address space. Pages loaded on demand.

Avoid Fragmentation

Use memory pools for same-sized objects. Free in reverse allocation order when possible.

The Engineering of Memory: From Silicon to Virtual Address Spaces

Every variable you declare, every object you instantiate, and every string you concatenate must eventually be mapped to a microscopic capacitor on a silicon RAM chip holding an electrical charge. Bridging the gap between high-level code and physical electrons requires one of the most complex, beautiful, and heavily optimized subsystems in modern computing: The Memory Manager.


Part 1: The Stack (Speed Through Simplicity)

The Stack is the workhorse of execution. When a thread starts, the OS allocates a fixed, contiguous block of memory (usually 1MB to 8MB) explicitly for this thread's Stack.

When you call a function, the CPU pushes a "Stack Frame" onto this memory block. This frame contains all the local variables for that function, the CPU registers, and the return address. Allocating memory on the Stack is computationally trivial: it requires exactly one CPU instruction to subtract a value from the Stack Pointer (SP) register.

Because Stack memory is perfectly contiguous and accessed sequentially, it is incredibly cache-friendly. The CPU fetches a chunk of the Stack into its L1 cache, meaning subsequent variable reads happen in a fraction of a nanosecond. However, the Stack is rigid. If you try to allocate a 10MB image array here, you will crash the program with a Stack Overflow.

Part 2: The Heap and The Allocator

For data whose size is unknown at compile time, or data that must outlive the function that created it, we use the Heap. Unlike the Stack's neat LIFO structure, the Heap is a massive, chaotic pool of memory shared across the entire process.

When you call malloc() in C, or new Object() in Java, you are querying the Heap Allocator (like jemalloc or tcmalloc). The allocator must scan its internal data structures (often Free Lists or B-Trees) to find a contiguous block of RAM large enough to satisfy your request.

The Cost of Malloc

Heap allocation is profoundly more expensive than Stack allocation. It requires acquiring thread locks (mutexes) to prevent race conditions, searching data structures, and navigating Fragmentation. If memory is repeatedly allocated and freed in different sizes, the Heap becomes "swiss cheese"—plenty of total free space, but chopped into tiny, unusable fragments.

Part 3: Virtual Memory (The Great Illusion)

If two programs both try to write to physical memory address 0xFFF000, they would corrupt each other. Modern Operating Systems solve this using Virtual Memory.

When your program prints a memory pointer (e.g., 0x7ffee9b), it is lying to you. That is NOT a physical location on the RAM stick. It is a fake, "Virtual" address. Every single process believes it has exclusive access to a massive, pristine, contiguous 64-bit address space.

Every time the CPU requests data, the MMU (Memory Management Unit)—a dedicated piece of silicon—intercepts the virtual address, looks up a massive index called a Page Table maintained by the OS kernel, and translates it into the true hardware Physical Address on the fly.

Part 4: Page Faults and Demand Paging

When you allocate 1GB of memory (malloc(1024 * 1024 * 1024)), the OS does not actually give you 1GB of physical RAM. It simply updates the Page Table and says, "Sure, you have it."

This is called Demand Paging. It is only when you actually attempt to write data to those addresses that the MMU realizes the physical backing doesn't exist yet. This triggers a hardware interrupt called a Page Fault. The CPU halts your program, traps into the OS Kernel, the Kernel frantically finds a free 4KB physical "Page" in RAM, updates the mapping, and resumes your program as if nothing happened.

If you run out of physical RAM entirely, the OS takes the least-recently-used Page of an idle program and writes it to the hard drive (SWAP space) to make room. When that idle program wakes up and tries to read its memory, it hits a "Major Page Fault", and the CPU halts for excruciating milliseconds while the OS reads the data back from the slow hard drive.

Conclusion: The Abstraction Hierarchy

From the programmer's perspective, memory is just an infinite, flat array of bytes. But beneath the surface, it is a ferocious battleground of optimizations: Registers, L1/L2/L3 caches, TLBs (Translation Lookaside Buffers), Page Tables, SWAP files, and Garbage Collectors. Understanding these layers is the dividing line between writing code that works, and writing code that screams.

Glossary & Concepts

Stack

LIFO memory region for function call frames and local variables. Automatic allocation/deallocation.

Heap

Dynamic memory region managed by malloc/free. Can grow/shrink. Requires manual management (or GC).

Virtual Memory

OS abstraction: each process has its own address space. Mapped to physical RAM via page tables.

Page Fault

CPU exception when accessing unmapped virtual address. OS loads page from disk or allocates new.

TLB (Translation Lookaside Buffer)

CPU cache for virtual→physical address translations. TLB miss = expensive page table walk.

mmap

Map files or anonymous memory directly into address space. Used for large allocations and shared memory.