π§ When a Goroutine Wonβt Die: Debugging a Real Goroutine Leak with pprof
Goroutines are one of Goβs most powerful features. Theyβre lightweight, cheap, and easy to spin up. But that power comes with responsibility: if you forget to stop them, goroutines can quietly accumulate until your service starts to choke.
This is the story of how I discovered a real goroutine leak in production after a library update β and how I tracked it down using pprof
, debugged the root cause, and submitted a pull request to fix it.
π The Setup
I maintain a Go microservice that listens to a queue. Every time a new message arrives, the service generates a PDF and uploads it to S3.
To generate the PDFs, we use Maroto β a nice Go library with a fluent API for building PDF layouts.
At some point, the library introduced a new feature: parallel page rendering using goroutines. The results looked amazing:
// Maroto v1: 7.78s
// Maroto v2 (concurrent): 546ms
After local tests confirmed the performance gains, we rolled the update to production.
π¨ Then the Alarms Hit
A few days later, New Relic showed something odd. One replica started to restart frequently due to high memory usage.
Looking at the metrics dashboard, the number of goroutines was exploding:
Over 7,000 goroutines were active β and rising steadily.
It was clear: something was leaking goroutines.
π΅οΈ Profiling with pprof
To diagnose the issue, I enabled net/http/pprof
in the service. Hereβs how:
- Import the package:
import _ "net/http/pprof"
- Start an HTTP server in your app (usually in a separate goroutine):
go func() {
log.Println(http.ListenAndServe("localhost:6060", nil))
}()
- From there, you can collect profiles:
-
View goroutine count in browser:
http://localhost:6060/debug/pprof/goroutine?debug=1
-
Download full profile to file:
go tool pprof http://localhost:6060/debug/pprof/goroutine
-
Inside the interactive
pprof
shell, use:top list <function>
Before generating any documents:
goroutine profile: total 5
After generating just 10 documents:
goroutine profile: total 106
100 @ github.com/f-amaral/go-async/pool.NewPool[...].func1
This clearly pointed to go-async/pool.NewPool
β the worker pool used inside Maroto to process pages concurrently.
π The Root Cause
Looking at the source code of the library, hereβs what was happening:
cfg := config.NewBuilder().
WithConcurrentMode(10).
Build()
mrt := maroto.New(cfg)
mrt.Generate()
Inside maroto.New
, a new goroutine pool was created every time a document was generated:
if cfg.GenerationMode == generation.Concurrent {
m.pool = pool.NewPool(...) // never closed!
}
So every request to our service would create a new Maroto
instance, which in turn would spin up a pool of goroutines β and none of them were ever shut down.
βοΈ The Fix
The fix was simple in concept:
- Move the creation of the pool inside
generateConcurrently()
so it lives only during that execution - Add a
defer pool.Close()
to ensure cleanup
Before:
m.pool = pool.NewPool(...)
processed := m.pool.Process(...)
After:
p := pool.NewPool(...)
defer p.Close()
processed := p.Process(...)
This ensures the goroutines are terminated after each Generate()
call, avoiding leaks.
You can see the full diff in PR #499.
π§ Unit Test to Prevent Regression
I also added a test to ensure goroutines are cleaned up after multiple calls:
t.Run("should not leak goroutines", func(t *testing.T) {
cfg := config.NewBuilder().
WithConcurrentMode(10).
Build()
sut := maroto.New(cfg)
for i := 0; i < 30; i++ {
sut.AddRow(10, col.New(12))
}
initial := runtime.NumGoroutine()
sut.Generate()
sut.Generate()
sut.Generate()
time.Sleep(100 * time.Millisecond)
final := runtime.NumGoroutine()
assert.Equal(t, initial, final)
})
π Before & After
Metric | Before | After |
---|---|---|
Goroutines (10 docs) | 106 | ~5 |
Peak in prod | 7,000+ | ~50 |
Replica restarts | Frequent | None |
π‘ Takeaways
- Monitor goroutine count in production β especially in queue-based services
- Use
pprof
early and often - Always close goroutine pools and workers
- Donβt assume performance gains are risk-free β test for stability
π Final Thoughts
This issue reminded me how subtle and dangerous goroutine leaks can be. Thankfully, with the right tools and awareness, they can also be easy to spot and fix.
If youβre interested in learning more about how goroutine pools work and how to build your own, I wrote a full post breaking down the Worker Pool pattern in Go, including diagrams, code, pros/cons, and shutdown logic:
π Golang Worker Pool Explained