We often write code without thinking about how performant the code will be in the end. Often it is also not worth to sacrifice cleaner code in terms of readability and future maintainability for the sake of a few percents of performance, be it less CPU cycles needed to compute something or a smaller memory footprint.

But as always even if you don’t need to save a few cycles or bytes, it is good to know what you could do to even get started in optimize for performance. This post will about how to use the tooling provided by the Go programming language.

The Scenario

Let’s do something very easy! We just want to generate a combination 6 random characters out of the alphabet (A-Z and a-z) and integers ranging from 0 to 9. The output could for example look like this bGrS7X. What this could be good for? I’ve got no idea, but it is a very easy example on how to use the extensive tooling of Go.

You can find all code for this post here.

What tools do we need?

We want multiple metrics for a given algorithm. For example, we want to know how many CPU cycles it needs to perform a specific operation. Additionally, we want to know how much memory it needs to do that. So let’s get started with benchmarks.

Benchmarks

As always Go already provides all we need in the testing package. 1 As the documentation states you just have to add the -bench flag to go test and write benchmarks following the presented examples.

So let me introduce a very simply algorithm to fulfill to earlier presented requirements of creating a random string.

func algStringCasting() string {
	// Stolen from https://blog.pratimbhosale.com/building-a-url-shortener-using-go-and-sqlite#heading-shortening-the-url
	randomPart := ""
	for i := 0; i < 6; i++ {
		randomPart += string(rune(rand.Intn(26) + 97))
	}

	return randomPart
}

A simple benchmark for this algorithm would then look like this.

func Benchmark_algStringCasting(b *testing.B) {
	for n := 0; n < b.N; n++ {
		algStringCasting()
	}
}

In this case there is no need to increase any external variable to run it with a different set of inputs, as the algorithm should be static. To get some results we then just execute go test -bench=. ./.., which will give us the following.

➜  03-generate-random-chars git:(main) ✗ go test -bench=. ./..
goos: linux
goarch: amd64
pkg: github.com/andresterba/code-for-blog/2023/03
cpu: Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz
Benchmark_algStringCasting-8    1389026        869.3 ns/op
PASS
ok      github.com/andresterba/code-for-blog/2023/03    5.753s

The first number we can see 1389026 is the number of runs of this function. The second number is the needed time in average in nanoseconds per operations. The -8 suffix shows the number of CPUs used to run the benchmark. [3]

As we want more stable results and remove statistical outliers, we always want to run benchmarks multiple times. We can simply do this by adding the -benchtime flag with the special syntax 10x to run the benchmark 10 times.

So now we got some metrics for CPU and runtime, but we also want to now how much memory the algorithm is using and how many allocations it needs. To get this we just have to add another flag to the test command, which is -benchmem. With these two additional flags we get the following output.

go test -bench=. -benchmem -benchtime=100x ./...
goos: linux
goarch: amd64
pkg: github.com/andresterba/code-for-blog/2023/03
cpu: Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz
Benchmark_algStringCasting-8         100               464.8 ns/op            56 B/op         11 allocs/op
PASS
ok      github.com/andresterba/code-for-blog/2023/03    0.004s

Each operation allocations 56 bytes of memory in 11 allocations.

Add another algorithm

Now that we have a working benchmark setup, let’s add another algorithm that solves the same problem and use the provided tooling to compare it against our first try.

Instead of generating random chars, we will now provide a lookup array and just create random indexes for that array.

var charArray [62]string = [62]string{
	"a", "b", "c", "d", "e", "f", "g",
	"h", "i", "j", "k", "l", "m", "n",
	"o", "p", "q", "r", "s", "t", "u",
	"v", "w", "x", "y", "z",
	"A", "B", "C", "D", "E", "F", "G",
	"H", "I", "J", "K", "L", "M", "N",
	"O", "P", "Q", "R", "S", "T", "U",
	"V", "W", "X", "Y", "Z",
	"0", "1", "2", "3", "4", "5", "6",
	"7", "8", "9",
}

func algArrayLookup() string {
	// Stolen from https://www.geeksforgeeks.org/program-generate-random-alphabets/
	res := ""

	for i := 0; i < 6; i++ {
		res = res + charArray[rand.Intn(62)]
	}

	return res
}

The benchmark looks exactly the same as the one for our first algorithm, we just call algArrayLookup instead of algStringCasting.

func Benchmark_algArrayLookup(b *testing.B) {
	for n := 0; n < b.N; n++ {
		algArrayLookup()
	}
}

And the results? They look like this:

go test -bench=. -benchmem -benchtime=100x ./...
goos: linux
goarch: amd64
pkg: github.com/andresterba/code-for-blog/2023/03
cpu: Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz
Benchmark_algStringCasting-8         100               874.9 ns/op            56 B/op         11 allocs/op
Benchmark_algArrayLookup-8           100               395.0 ns/op            24 B/op          5 allocs/op
PASS
ok      github.com/andresterba/code-for-blog/2023/03    0.004s

So our second algorithm takes less time per operation, bytes per operation and requires only 5 allocations instead of 11. It might be a good idea to add benchmarks like this to your code base, before you make any major changes in the implementation of a specific functionality. But how do you know if your new implementation is better/faster than the old one?

Of course, you can just keep the old benchmarks lying around and check them manually. But there is also a tool for that, called benchstat.

Testing with different inputs

So let’s come up with another example, where we don’t want to generate a static element, which is independent of some input. Let’s use a well known algorithm that scales with a provided input, for this example we are using the Euclidean algorithm.

It is used to find the greatest common divisor of two numbers. An example from the Wikipedia is 21 is the GCD of 252 and 105. Pseudocode for this looks like this. 2

function gcd(a, b)
    while b ≠ 0
        t := b
        b := a mod b
        a := t
    return a

In Go this will look like this.

func Euclid(a int, b int) int {
	for b != 0 {
		t := b
		b = a % b
		a = t
	}

	return a
}

And the output for our example 252 and 105 will be as expected:

➜ go run main.go

GCD of 252 and 105 is 21.

Then let’s write a benchmark for this with different inputs. We define a table called inputs to collect a list of inputs we want to benchmark. b.Run() accepts as first argument the name of the benchmark.

var inputs = []struct {
	a int
	b int
}{
	{
		a: 252,
		b: 105,
	},
	{
		a: 1337,
		b: 105,
	},
}

func BenchmarkEuclid(b *testing.B) {
	for _, v := range inputs {
		b.Run(fmt.Sprintf("a:%d b:%d", v.a, v.b), func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				Euclid(v.a, v.b)
			}
		})
	}
}

The result looks like this.

go test -benchmem ./..

goos: linux
goarch: amd64
pkg: github.com/andresterba/code-for-blog/2023/03/euclid
cpu: Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz
BenchmarkEuclid/a:252_b:105-8         	21771715	        50.82 ns/op	       0 B/op	       0 allocs/op
BenchmarkEuclid/a:1337_b:105-8        	17221270	        58.49 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	github.com/andresterba/code-for-blog/2023/03/euclid	4.139s

Interesting is that the compiler seems to optimize the code to remove the allocation t := b. But profiling is a different topic I will maybe cover in the near future. If you want to know more about how to test with variable inputs, I can recommend this 3 blog post.

Conclusion

Writing benchmarks in Go is very easy, so there should be no reason to not do it for performance critical parts of your application. As always to Go tooling is very handy and easy to use.