The Go Scheduler: Engineering Concurrency

In languages like Java or C++, when you create a thread (e.g., new Thread()), you are usually creating an OS-level thread. OS threads are heavy: they consume a minimum of 1-2 MB of memory for their stack, and the OS kernel must pause the entire CPU core to switch contexts between them. Because of this weight, a standard server can only run a few thousand OS threads before it runs out of RAM or spends 100% of its CPU time purely switching contexts.

Go takes a different approach. The go myFunc() keyword creates a Goroutine. Goroutines are user-space threads managed entirely by the Go language runtime, not the Operating System. They start with an incredibly tiny 2 KB stack that grows and shrinks dynamically. You can easily spawn 1,000,000 goroutines on a standard laptop. But how does Go execute 1 million goroutines on an 8-core CPU? This is the job of the Go Scheduler.

1. The G-M-P Model: M:N Scheduling

Go uses an M:N scheduling model. It maps M goroutines onto N OS threads. To manage this chaos efficiently, the scheduler relies on three core entities: G, M, and P.

G (Goroutine)

The executable unit. A struct holding the instruction pointer, stack pointer, and status (Runnable, Running, Waiting). A G does nothing on its own; it must be executed by an M.

M (Machine/Thread)

An actual OS-level thread. The M's only job is to execute the instructions of the G currently assigned to it. However, an M cannot execute a G without holding a P.

P (Processor)

A logical, virtual CPU. It holds the Local Run Queue (LRQ) of runnable Gs. By default, there is exactly one P for every physical core on your machine (controlled by GOMAXPROCS).

The invariant rule: For an OS Thread (M) to execute a Goroutine (G), the M must first acquire a Processor (P). The P hands the M a G from its local queue, and the M executes it until the G blocks or finishes.

2. The Local Run Queue, Global Queue, and Work Stealing

When you type go func(), the runtime creates a new G and places it in the Local Run Queue (LRQ) of the P that spawned it. The M attached to that P will eventually pop the G and execute it.

Work Stealing: Preventing Idle CPUs

What if P1 has 100 goroutines in its local queue, but P2's local queue is empty? Without intervention, P2's attached OS thread (M) would sit idle, wasting CPU cycles, while P1 is overwhelmed.

To solve this, Go implements a Work Stealing Algorithm. If an M finishes all the Gs in its P's local queue, it will actively try to "steal" work from other Ps. It looks at a random P, and if that P has Gs, it steals half of them and moves them to its own local queue. This ensures that work is continuously and evenly distributed across all available CPU cores without relying on a centralized, highly-contended locking mechanism.

The Global Run Queue

A Local Run Queue can only hold 256 goroutines. If a P's queue is full, newly created Gs spill over into the Global Run Queue (GRQ). Additionally, to ensure goroutines in the GRQ don't starve forever while Ps steal from each other, the runtime guarantees that every 61st scheduler "tick", a P will explicitly pull a goroutine from the Global Run Queue.

3. What happens when a Goroutine Blocks?

Concurrency is useless if threads can't wait for things (like a database query returning, or a disk write completing). But if a Goroutine issues a blocking command, doesn't that block the underlying OS Thread (M)? Yes, it does. If the M is blocked, the CPU core is wasted. Go handles this brilliantly in two different ways depending on the type of block.

Scenario A: Asynchronous Network I/O (The Netpoller)

If a Goroutine makes a network request (e.g. http.Get()), the runtime intercepts it. Instead of letting the OS thread block on the network socket, Go adds the socket to the OS's asynchronous event notification system (epoll on Linux, kqueue on Mac).
The Goroutine is marked as "Waiting", and the M is immediately freed to execute the next Goroutine in the P's local queue. A background thread (the Netpoller) continuously monitors epoll. When the network response arrives, the Netpoller wakes up the original Goroutine and moves it back to a Runnable queue.

Scenario B: Synchronous Syscalls (The Handoff)

Some operations (like reading a file from disk, or invoking C code via cgo) cannot be done asynchronously. The OS Thread (M) must block.
Because the M is completely frozen by the OS, the P (Processor/Virtual CPU) attached to it is also frozen. To prevent the entire application from stalling, the Go runtime executes a Handoff: it completely detaches the P from the frozen M. The runtime then looks for a spare, idle M (or creates a brand new OS thread), attaches the P to it, and continues executing the other Goroutines in the P's queue. When the original disk read finishes, the frozen M wakes up, tries to acquire a new P to resume its Goroutine, and if it can't, parks itself in a sleep state.

4. Asynchronous Preemption (Go 1.14+)

Before Go 1.14, the scheduler was cooperative. It relied on Goroutines voluntarily yielding the CPU when they made function calls. If a developer wrote a tight, infinite loop (e.g. for { i++ }) with no function calls, that Goroutine would monopolize the P and M forever. The program would lock up.

Go 1.14 introduced Asynchronous Preemption. The runtime runs a background thread called the sysmon (system monitor). The sysmon detects if any Goroutine has been running continuously for more than 10 milliseconds. If it has, sysmon fires an OS-level signal (SIGURG) directly at the heavily-running OS thread (M). This hardware interrupt forcibly pauses the running Goroutine, rewrites its instruction pointer to run a scheduling function, and moves it back to the Global Run Queue. The M is then free to execute other code.

References & Deep Dives

Ardan Labs: Scheduling In Go (Part 2) - William Kennedy's definitive 3-part blog series on the scheduler. Often cited as the best plain-English explanation of Go concurrency.
Scalable Go Scheduler Design Doc - The original 2012 Google design doc by Dmitry Vyukov that introduced the "P" to create the GMP model and solved the severe global lock contention issues of early Go.
Morsmachine: The Go Scheduler - A classic, highly-technical breakdown of work stealing and the `sysmon` thread interactions.

Go Routine Scheduler

Goroutine States

How to Use

Processors (P)

Global Queue

All Goroutines (0 active)

Related Tools

How Thread Pools Work

How Event Loops Work

How Containers Actually Work