Introduction
Recently, I’ve noticed a slight tremor among colleagues preparing for Go interviews - everyone understands that questions about Go’s most hyped feature are inevitable. Sure, you’ve all read those articles - with pictures and letters M, G, P in circles, you know about GOMAXPROCS (though probably not what you should), but when the interviewer dives deeper into details, like asking - why do we need these goroutines and why weren’t OS threads enough? - that’s when the incoherent mumbling begins.
I’ll try to arm you - I’m confident that the following explanation will impress the interviewer, and you’ll pass the interview brilliantly. And, following the blog’s rules - in simple terms.
The author assumes that the reader has a basic understanding of operating systems and can tell the difference between heap and stack.
Context Switching
Let’s start from the opposite - let’s recall what happens during OS thread context switching, what it entails, in order of increasing problems:
- We need to save CPU registers - we save them in the Process Control Block - a special area in the OS kernel memory that contains all information about the thread’s state. But this isn’t very costly.
- What about caches? Especially if we switched to a thread from another process - the data in them is no longer relevant, we’ll get cache misses and wait while the prefetch mechanism gradually fills the cache with the current thread’s data.
- Again - if it’s a thread from another process, we have a new virtual address space, meaning a TLB (Translation Lookaside Buffer) reload.
- And most importantly - when does the OS scheduler decide to preempt the current thread? Based on its heuristics, but a typical example is a blocking system call to OS kernel services, such as disk or socket I/O, or interprocess communications - sending and receiving messages, or synchronizing critical sections through mutexes and semaphores when a thread blocks waiting for a resource to be released by another thread. And the problem here is that the initially allocated time quantum isn’t fully used - only the part up to such a call.
That’s why threads are called “heavyweight” - they require significant resources just for switching between them.
Our Little Friends: Goroutines
So what’s the idea behind goroutines? To use OS threads as efficiently as possible by avoiding the context switching costs described above. Let’s see how this is achieved.
The Go runtime, unlike the OS scheduler, has complete control over goroutine execution. It knows where and when blocking calls will occur, and instead of letting the OS suspend the thread, it turns these calls into asynchronous operations.
When one goroutine blocks, for example waiting for I/O, the Go runtime doesn’t wait for the OS to preempt the thread. Instead, it immediately switches execution to another goroutine from the local queue of the current thread, effectively substituting the execution context. This process can be compared to a GOTO operator, where control is transferred to a new goroutine without OS intervention.
Thus, the Go scheduler optimizes multitasking at the user space level, minimizing kernel interaction and depriving the OS of reasons to preempt the current thread.
The cost of switching a goroutine is quite small - because a goroutine is essentially a snapshot of a small stack (starts at 2KB but can dynamically grow up to 1GB if needed). When preempting a goroutine, only a minimal context needs to be saved: just three processor registers - Program Counter (PC, points to the current instruction), Stack Pointer (SP, points to the top of the stack), and Base Pointer (BP, points to the stack base), plus the contents of the stack itself. No complex kernel data structures, no address space switches, no cache flushes - everything happens in user space and within a single OS thread.
However, goroutines can’t be switched just anywhere - Go has the concept of safe points for this. These are special places in the code where the runtime can safely suspend a goroutine and switch to another. Why is this important? Imagine interrupting a goroutine right in the middle of updating a complex data structure or at a moment when some pointers are temporarily invalid - this could lead to data corruption or hard-to-debug errors.
The Go compiler automatically inserts safe points in certain places in the code:
- During function calls - there’s a switch check in the prologue of each function
- At the beginning of each for loop iteration - so long loops don’t hog the processor
- During heap memory allocation operations - as this is a potentially long operation
- During channel operations and other blocking calls
- When returning from a function
At these points, the goroutine essentially “asks” the scheduler: “Hey, I’m about to do something that might take a while, can you switch to another goroutine while I’m waiting?”
GOMAXPROCS
A common misconception is that GOMAXPROCS limits the number of OS threads. This is not true - it only limits the number of threads that can execute Go code simultaneously.
What happens if all GOMAXPROCS threads are blocked? In this case, the Go runtime will create a new thread - and will keep doing so as long as there are no available working threads. So GOMAXPROCS doesn’t limit the maximum number of threads in the system - it only indicates how many threads can work simultaneously.
This means if you set this parameter to 1, and this single thread gets blocked, the runtime will create a new thread - if that one gets blocked too, then another new one, and thus the total number of threads cannot be predicted.
Summary
So what makes goroutines so efficient:
- Minimal context switching overhead - only three registers and a small stack
- Cooperative multitasking through safe points - goroutines decide when to yield the processor
- Smart handling of blocking calls - Go runtime turns them into asynchronous operations
- Efficient use of OS threads - GOMAXPROCS threads handle thousands of goroutines
Go takes care of all the complexity of managing this mechanism - the programmer just needs to write go myFunction()
, and the runtime will take care of executing the code efficiently.
That’s why goroutines have become one of the most successful implementations of lightweight threads (green threads) in modern programming languages. They allow writing high-performance concurrent code while keeping it simple to understand and maintain.
What About Others?
Java 21 introduced virtual threads (Project Loom) - an analog to goroutines, but they still follow the classical thread model, making them less efficient. However, they are fully compatible with the existing Thread API.
Rust took a completely different path - there’s no built-in runtime for asynchronous execution, instead using a type system with Future and async/await. Popular runtimes like tokio or async-std need to be included separately. This approach provides more control but requires more explicit code.
You could say that Go found the golden middle - a simple and clear API like Java, but with efficient implementation and well-thought-out runtime like Rust.