🧠 When a Goroutine Won’t Die: Debugging a Real Goroutine Leak with pprof

Goroutines are one of Go’s most powerful features. They’re lightweight, cheap, and easy to spin up. But that power comes with responsibility: if you forget to stop them, goroutines can quietly accumulate until your service starts to choke.

This is the story of how I discovered a real goroutine leak in production after a library update β€” and how I tracked it down using pprof, debugged the root cause, and submitted a pull request to fix it.


πŸ“† The Setup

I maintain a Go microservice that listens to a queue. Every time a new message arrives, the service generates a PDF and uploads it to S3.

To generate the PDFs, we use Maroto β€” a nice Go library with a fluent API for building PDF layouts.

At some point, the library introduced a new feature: parallel page rendering using goroutines. The results looked amazing:

// Maroto v1: 7.78s // Maroto v2 (concurrent): 546ms

After local tests confirmed the performance gains, we rolled the update to production.


🚨 Then the Alarms Hit

A few days later, New Relic showed something odd. One replica started to restart frequently due to high memory usage.

Looking at the metrics dashboard, the number of goroutines was exploding:

Over 7,000 goroutines were active β€” and rising steadily.

New Relic goroutines

It was clear: something was leaking goroutines.


πŸ•΅οΈ Profiling with pprof

To diagnose the issue, I enabled net/http/pprof in the service. Here’s how:

  1. Import the package:
import _ "net/http/pprof"
  1. Start an HTTP server in your app (usually in a separate goroutine):
go func() { log.Println(http.ListenAndServe("localhost:6060", nil)) }()
  1. From there, you can collect profiles:
  • View goroutine count in browser: http://localhost:6060/debug/pprof/goroutine?debug=1

  • Download full profile to file:

    go tool pprof http://localhost:6060/debug/pprof/goroutine
  • Inside the interactive pprof shell, use:

    top list <function>

Before generating any documents:

goroutine profile: total 5

After generating just 10 documents:

goroutine profile: total 106 100 @ github.com/f-amaral/go-async/pool.NewPool[...].func1

This clearly pointed to go-async/pool.NewPool β€” the worker pool used inside Maroto to process pages concurrently.


πŸ”Ž The Root Cause

Looking at the source code of the library, here’s what was happening:

cfg := config.NewBuilder(). WithConcurrentMode(10). Build() mrt := maroto.New(cfg) mrt.Generate()

Inside maroto.New, a new goroutine pool was created every time a document was generated:

if cfg.GenerationMode == generation.Concurrent { m.pool = pool.NewPool(...) // never closed! }

So every request to our service would create a new Maroto instance, which in turn would spin up a pool of goroutines β€” and none of them were ever shut down.


βš’οΈ The Fix

The fix was simple in concept:

  1. Move the creation of the pool inside generateConcurrently() so it lives only during that execution
  2. Add a defer pool.Close() to ensure cleanup

Before:

m.pool = pool.NewPool(...) processed := m.pool.Process(...)

After:

p := pool.NewPool(...) defer p.Close() processed := p.Process(...)

This ensures the goroutines are terminated after each Generate() call, avoiding leaks.

You can see the full diff in PR #499.


🚧 Unit Test to Prevent Regression

I also added a test to ensure goroutines are cleaned up after multiple calls:

t.Run("should not leak goroutines", func(t *testing.T) { cfg := config.NewBuilder(). WithConcurrentMode(10). Build() sut := maroto.New(cfg) for i := 0; i < 30; i++ { sut.AddRow(10, col.New(12)) } initial := runtime.NumGoroutine() sut.Generate() sut.Generate() sut.Generate() time.Sleep(100 * time.Millisecond) final := runtime.NumGoroutine() assert.Equal(t, initial, final) })

πŸ“Š Before & After

Metric Before After
Goroutines (10 docs) 106 ~5
Peak in prod 7,000+ ~50
Replica restarts Frequent None

πŸ’‘ Takeaways

  • Monitor goroutine count in production β€” especially in queue-based services
  • Use pprof early and often
  • Always close goroutine pools and workers
  • Don’t assume performance gains are risk-free β€” test for stability

πŸ™Œ Final Thoughts

This issue reminded me how subtle and dangerous goroutine leaks can be. Thankfully, with the right tools and awareness, they can also be easy to spot and fix.

If you’re interested in learning more about how goroutine pools work and how to build your own, I wrote a full post breaking down the Worker Pool pattern in Go, including diagrams, code, pros/cons, and shutdown logic:

πŸ‘‰ Golang Worker Pool Explained