Profiling: Debugging a Real Goroutine Leak

July 23, 2025 - 4 mins read

🧠 When a Goroutine Won’t Die: Debugging a Real Goroutine Leak with `pprof`

Goroutines are one of Go’s most powerful features. They’re lightweight, cheap, and easy to spin up. But that power comes with responsibility: if you forget to stop them, goroutines can quietly accumulate until your service starts to choke.

This is the story of how I discovered a real goroutine leak in production after a library update — and how I tracked it down using pprof, debugged the root cause, and submitted a pull request to fix it.

📆 The Setup

I maintain a Go microservice that listens to a queue. Every time a new message arrives, the service generates a PDF and uploads it to S3.

To generate the PDFs, we use Maroto — a nice Go library with a fluent API for building PDF layouts.

At some point, the library introduced a new feature: parallel page rendering using goroutines. The results looked amazing:

// Maroto v1: 7.78s  
// Maroto v2 (concurrent): 546ms

After local tests confirmed the performance gains, we rolled the update to production.

🚨 Then the Alarms Hit

A few days later, New Relic showed something odd. One replica started to restart frequently due to high memory usage.

Looking at the metrics dashboard, the number of goroutines was exploding:

Over 7,000 goroutines were active — and rising steadily.

New Relic goroutines

It was clear: something was leaking goroutines.

🕵️ Profiling with `pprof`

To diagnose the issue, I enabled net/http/pprof in the service. Here’s how:

Import the package:

import _ "net/http/pprof"

Start an HTTP server in your app (usually in a separate goroutine):

go func() {
    log.Println(http.ListenAndServe("localhost:6060", nil))
}()

From there, you can collect profiles:

View goroutine count in browser: http://localhost:6060/debug/pprof/goroutine?debug=1

Download full profile to file:

go tool pprof http://localhost:6060/debug/pprof/goroutine

Inside the interactive pprof shell, use:
```
top
list <function>
```

Before generating any documents:

goroutine profile: total 5

After generating just 10 documents:

goroutine profile: total 106
100 @ github.com/f-amaral/go-async/pool.NewPool[...].func1

This clearly pointed to go-async/pool.NewPool — the worker pool used inside Maroto to process pages concurrently.

🔎 The Root Cause

Looking at the source code of the library, here’s what was happening:

cfg := config.NewBuilder().
    WithConcurrentMode(10).
    Build()

mrt := maroto.New(cfg)
mrt.Generate()

Inside maroto.New, a new goroutine pool was created every time a document was generated:

if cfg.GenerationMode == generation.Concurrent {
    m.pool = pool.NewPool(...) // never closed!
}

So every request to our service would create a new Maroto instance, which in turn would spin up a pool of goroutines — and none of them were ever shut down.

⚒️ The Fix

The fix was simple in concept:

Move the creation of the pool inside generateConcurrently() so it lives only during that execution
Add a defer pool.Close() to ensure cleanup

Before:

m.pool = pool.NewPool(...)
processed := m.pool.Process(...)

After:

p := pool.NewPool(...)
defer p.Close()
processed := p.Process(...)

This ensures the goroutines are terminated after each Generate() call, avoiding leaks.

You can see the full diff in PR #499.

🚧 Unit Test to Prevent Regression

I also added a test to ensure goroutines are cleaned up after multiple calls:

t.Run("should not leak goroutines", func(t *testing.T) {
    cfg := config.NewBuilder().
        WithConcurrentMode(10).
        Build()

    sut := maroto.New(cfg)

    for i := 0; i < 30; i++ {
        sut.AddRow(10, col.New(12))
    }

    initial := runtime.NumGoroutine()
    sut.Generate()
    sut.Generate()
    sut.Generate()
    time.Sleep(100 * time.Millisecond)
    final := runtime.NumGoroutine()

    assert.Equal(t, initial, final)
})

📊 Before & After

Metric	Before	After
Goroutines (10 docs)	106	~5
Peak in prod	7,000+	~50
Replica restarts	Frequent	None

💡 Takeaways

Monitor goroutine count in production — especially in queue-based services
Use pprof early and often
Always close goroutine pools and workers
Don’t assume performance gains are risk-free — test for stability

🙌 Final Thoughts

This issue reminded me how subtle and dangerous goroutine leaks can be. Thankfully, with the right tools and awareness, they can also be easy to spot and fix.

If you’re interested in learning more about how goroutine pools work and how to build your own, I wrote a full post breaking down the Worker Pool pattern in Go, including diagrams, code, pros/cons, and shutdown logic:

👉 Golang Worker Pool Explained