Go Routine Scheduler
Runs in browserVisualize the GMP scheduling model
Running
0
Waiting
0
Total
0
Goroutine States
How to Use
Adjust GOMAXPROCS and spawn goroutines.
You will see:
- M:N Scheduling (G on P)
- Work Stealing between Processors
- Goroutine Lifecycle States
Processors (P)
Global Queue
All Goroutines (0 active)
The Go Scheduler: Engineering Concurrency
In languages like Java or C++, when you create a thread (e.g., new Thread()),
you are usually creating an OS-level thread. OS threads are heavy: they consume a minimum of
1-2 MB of memory for their stack, and the OS kernel must pause the entire CPU core to switch
contexts between them. Because of this weight, a standard server can only run a few thousand
OS threads before it runs out of RAM or spends 100% of its CPU time purely switching
contexts.
Go takes a different approach. The go myFunc() keyword creates a
Goroutine. Goroutines are user-space threads managed entirely by the Go
language runtime, not the Operating System. They start with an incredibly tiny 2 KB stack
that grows and shrinks dynamically. You can easily spawn
1,000,000 goroutines
on a standard laptop. But how does Go execute 1 million goroutines on an 8-core CPU? This is
the job of the Go Scheduler.
1. The G-M-P Model: M:N Scheduling
Go uses an M:N scheduling model. It maps M goroutines onto
N
OS threads. To manage this chaos efficiently, the scheduler relies on three core entities:
G, M, and P.
G (Goroutine)
The executable unit. A struct holding the instruction pointer, stack pointer, and status (Runnable, Running, Waiting). A G does nothing on its own; it must be executed by an M.
M (Machine/Thread)
An actual OS-level thread. The M's only job is to execute the instructions of the G currently assigned to it. However, an M cannot execute a G without holding a P.
P (Processor)
A logical, virtual CPU. It holds the Local Run Queue (LRQ) of
runnable Gs. By default, there is exactly one P for every physical core on your
machine (controlled by GOMAXPROCS).
The invariant rule: For an OS Thread (M) to execute a Goroutine (G), the M must first acquire a Processor (P). The P hands the M a G from its local queue, and the M executes it until the G blocks or finishes.
2. The Local Run Queue, Global Queue, and Work Stealing
When you type go func(), the runtime creates a new G and
places it in the Local Run Queue (LRQ) of the P that spawned it. The M attached
to that P will eventually pop the G and execute it.
Work Stealing: Preventing Idle CPUs
What if P1 has 100 goroutines in its local queue, but P2's local queue is empty? Without
intervention, P2's attached OS thread (M) would sit idle, wasting CPU cycles, while P1 is
overwhelmed.
To solve this, Go implements a Work Stealing Algorithm. If an M finishes
all the Gs in its P's local queue, it will actively try to "steal" work from other Ps. It
looks at a random P, and if that P has Gs, it steals half of them and moves them to
its own local queue. This ensures that work is continuously and evenly distributed across all
available CPU cores without relying on a centralized, highly-contended locking mechanism.
The Global Run Queue
A Local Run Queue can only hold 256 goroutines. If a P's queue is full, newly created Gs spill over into the Global Run Queue (GRQ). Additionally, to ensure goroutines in the GRQ don't starve forever while Ps steal from each other, the runtime guarantees that every 61st scheduler "tick", a P will explicitly pull a goroutine from the Global Run Queue.
3. What happens when a Goroutine Blocks?
Concurrency is useless if threads can't wait for things (like a database query returning, or a disk write completing). But if a Goroutine issues a blocking command, doesn't that block the underlying OS Thread (M)? Yes, it does. If the M is blocked, the CPU core is wasted. Go handles this brilliantly in two different ways depending on the type of block.
Scenario A: Asynchronous Network I/O (The Netpoller)
If a Goroutine makes a network request (e.g. http.Get()), the runtime
intercepts it. Instead of letting the OS thread block on the network socket, Go adds the
socket to the OS's asynchronous event notification system (epoll on Linux,
kqueue
on Mac).
The Goroutine is marked as "Waiting", and the M is immediately freed to execute the next Goroutine
in the P's local queue. A background thread (the Netpoller) continuously monitors
epoll. When the network response arrives, the Netpoller wakes up the original
Goroutine and moves it back to a Runnable queue.
Scenario B: Synchronous Syscalls (The Handoff)
Some operations (like reading a file from disk, or invoking C code via cgo) cannot be done
asynchronously. The OS Thread (M) must block.
Because the M is completely frozen by the OS, the P (Processor/Virtual CPU) attached to it
is also frozen. To prevent the entire application from stalling, the Go runtime executes a
Handoff: it completely detaches the P from the frozen M. The runtime then
looks for a spare, idle M (or creates a brand new OS thread), attaches the P to it, and
continues executing the other Goroutines in the P's queue. When the original disk read
finishes, the frozen M wakes up, tries to acquire a new P to resume its Goroutine, and if
it can't, parks itself in a sleep state.
4. Asynchronous Preemption (Go 1.14+)
Before Go 1.14, the scheduler was cooperative. It relied on Goroutines
voluntarily yielding the CPU when they made function calls. If a developer wrote a tight,
infinite loop (e.g. for { i++ }) with no function calls, that
Goroutine would monopolize the P and M forever. The program would lock up.
Go 1.14 introduced Asynchronous Preemption. The runtime runs a background
thread called the sysmon (system monitor). The sysmon detects if
any Goroutine has been running continuously for more than 10 milliseconds. If it has,
sysmon
fires an OS-level signal (SIGURG) directly at the heavily-running OS thread
(M). This hardware interrupt forcibly pauses the running Goroutine, rewrites its
instruction pointer to run a scheduling function, and moves it back to the Global Run
Queue. The M is then free to execute other code.
References & Deep Dives
- Ardan Labs: Scheduling In Go (Part 2) - William Kennedy's definitive 3-part blog series on the scheduler. Often cited as the best plain-English explanation of Go concurrency.
- Scalable Go Scheduler Design Doc - The original 2012 Google design doc by Dmitry Vyukov that introduced the "P" to create the GMP model and solved the severe global lock contention issues of early Go.
- Morsmachine: The Go Scheduler - A classic, highly-technical breakdown of work stealing and the `sysmon` thread interactions.