Captain Codeman Captain Codeman

Golang Buffer Pool Gotcha

Because you might still be using that memory ...

Contents

Introduction

We all know that Go is extremely fast and very easy to develop with but as with any managed language, it’s easy to inadvertently generate large quantities of garbage. No, I’m not talking about poorly written code (yeah, I’m guilty of that!) but garbage as in “memory that has to be reclaimed”. This is done via Garbage Collection and the GC in Go keeps getting faster with each release but you still want to avoid it when you can.

One common culprit is creating temporary buffers for things like rendering, encoding and compression so an easy fix is to re-use these buffers instead of creating new ones each time. There’s a sync.Pool in the standard library that can be used for this but there is a subtle “gotcha” which I’ll explain.

Go Buffer Pool Implementation

Here’s a typical implementation of a bytes.Buffer pool in Go using the standard library sync.Pool:

package engine

import (
	"bytes"
	"sync"
)

// buffer pool to reduce GC
var buffers = sync.Pool{
	// New is called when a new instance is needed
	New: func() interface{} { 
		return new(bytes.Buffer)
	},
}

// GetBuffer fetches a buffer from the pool
func GetBuffer() *bytes.Buffer {
	return buffers.Get().(*bytes.Buffer)
}

// PutBuffer returns a buffer to the pool
func PutBuffer(buf *bytes.Buffer) {
	buf.Reset()
	buffers.Put(buf)
}

Very simple and hopefully fairly common.

Usage

Here’s a simple example of getting a buffer from the pool and using defer to add it back when the function completes. In this case, simply rendering a template into a byte slice:

func render(id string) []byte {
    data := storage.Get(id)

    w := GetBuffer()
    defer PutBuffer(w)

    template.Execute(w, data)

    return w.Bytes()
}

Great. We get a boost by re-using the buffers and reduce some of the GC pressure in our app - that not only saves memory but also some CPU as well.

The Gotcha - We’re Still Using It

Did you notice the issue? No, not that I didn’t handle the error that the template Execute method might return. More subtle - returning the result from w.Bytes().

Why is this bad and why is it a gotcha?

The problem is that this is returning a slice from inside the buffer and at the same time we’ve told the pool that it can re-use it. The internals of how slices in go are implemented means that something else might start writing to the same underlying memory before we’ve completely finished with it.

It can lead to subtle issues like content from one item suddenly appearing half-way through the rendered content of something else. If you have code that loops through items processing and writing one after the other then it may not be noticeable but if you change the code to write all the items at the end in a batch then it will be more likely to trigger (as will more load from concurrent requests on the system).

Make a Copy

We might be tempted to solve the problem by making a copy of the bytes from the buffer before we return:

b := make([]byte, w.Len())
copy(b, w.Bytes())

return b

While this does appear to solve the problem, it’s only really addressing the symptom and kind of defeats the purpose of creating the buffer pool in the first place - we’re back to creating temporary slices that will have to be Garbage Collected in addition to the buffers + pool we added. Oh dear.

What we really need to do to solve the issue is to make sure the scope of the buffer matches our use of it. We do this by moving the buffer creation and cleanup outside of the method.

As an aside, we don’t want to pass in the buffer as a buffer, we should instead make use of interfaces - there is no reason to tie our method to the implementation of what we’re using to write to, just that we want to write something. So instead our method should just expect an io.Writer that it can use. This makes our function much more useful because we may want to write to other writers in future.

Now the method looks like this:

func render(w io.Writer, id string) {
    data := storage.Get(id)
    template.Execute(w, data)
}

We still make use of the buffer pool but it’s moved outside of this method:

for _, id := range ids {
    w := GetBuffer()
    defer PutBuffer(w)

    render(w, id)

    // do something with w.Bytes()
}

Of course this example is contrived and over-simplified but hopefully, if you ever have buffers being corrupted, it might help explain what is happening and some possible solutions.