Posted on · 8 min to read · 1510 words

Go 1.24 switched to Swiss Tables as the underlying implementation of maps in Go [1]. Despite having used maps in Go since I started using it, I never really thought about different implementations of maps and the performance implications of using one or the other. To get an idea on the history of this core data structure in computer science, I can recommend reading the earlier linked blog post from the Go team.

In this post, I want to take a step back and implement a simple hash table in Go, so we can understand the core concepts of hash tables. This will enable us to dive a bit deeper into the topic in a future post.

What is a Hash Table?

A hash table is a data structure that allows for fast retrieval of values based on keys. From a users' perspective, it is a key-value store that allows for storing and retrieving values. To my knowledge, there is no difference between a hash table and a hash map, it depends on the language which term is used. In some languages only of these data structures is available, or provides specific features like thread-safety or ordered keys. For our tiny implementation there will be no difference between these two terms. Additionally, it allows for checking if a key exists in the map and deleting keys. The following is a simple example of a map in Go:

m := make(map[string]int)

m["my-key"] = 42
m["your-key"] = 1337

In our case the interface of our implementation will be a bit more verbose, like show in the following example. But it will also have a benefit over the built-in map in Go, as it will allow for storing any type of value without the need to define a type when initializing the map.

myMap := NewHashMap()

myMap.Insert("key1", "value1")
myMap.Insert("key2", 1337)

value, exists := myMap.Get("key1")
if exists {
	println("key1:", value.(string))
} else {
	println("key1 does not exist")
}

We need the following core components:

  1. Buckets are the primary storage unit of our implementation.
  2. Hash Function transforms a key into an index value that we can use to access the correct bucket.
  3. Collision Resolution Mechanism makes sure that we can handle cases where two different keys hash to the same index.
  4. Dynamic Resize allows us to increase the capacity of the hash map when it is full.

1.Buckets

A Bucket is the core storage unit of a hash map. There is no formal definition of what a bucket is as it can be implemented in various ways. The most common implementations are arrays or linked lists. In our case, we will use a linked list to store the entries in each bucket. Therefore, we don't need a structure Bucket, but rather our chained list of Entry structs represents the bucket. Additionally, we will use a HashMap struct to hold the buckets and other metadata as the current size and the overall capacity of the hash map. The capacity is a critical part of our implement as it defines how many buckets we have and how many entries we can store in the hash map before we need to resize it. But we will talk about resizing later.

const (
    baseCapacity = 8
)

type Entry struct {
    key   string
    value any
    next  *Entry
}

type HashMap struct {
    buckets  []*Entry
    size     int
    capacity int
}

2.Hash Function

To be able to store and retrieve values in our hash map, we need a way to transform the key into an index value that we can use to access the correct bucket where the value is stored. Luckily there is a type of function that does exactly that: a hash function. A hash function takes an arbitrary input (in our case a string key) and maps it to a fixed-size output. There are different hash functions available, and depending on the use case, the choice of the hash function can have a significant impact on the performance or security of a system. But in our case we can use a simple hash function that is fast, as we will call it for essentially any operation on the map implementation.

First we can rule out all hash functions that must compute hash values in the context of a cryptographic system, as we don't need any level of security. What is of interest to us is at first the speed of the hash function, and secondly the distribution of the hash values. The distribution is important to ensure that we don't have too many collisions. I can recommend this excellent post on stack overflow about hash functions in the context of hash tables[2]. Another even more detailed source is, which compares different hash functions and their performance[3]. Based on this we can consider two hash functions for our implementation:

  1. FNV-1a
  2. MurmurHash3

I decided to go with the FNV-1a as it is available in the Go standard library and therefore no external dependency is required. It implements the Hash.Hash interface and is pretty straightforward to use:

func hashFunction(key string) uint32 {
	h := fnv.New32a()
	h.Write([]byte(key))
	return h.Sum32()
}

Result for "my-key" is: 3018227921

If we feed the string "my-key" into the hash function, we get the value 3018227921. With this covered we now need to map this value to an index in our buckets. A simple way to do this is to use the modulo operator with the current capacity of the hash map. We will come back to this later when we think about resizing the hash map, but for now this is sufficient.

func (hm *HashMap) getBucketIndex(key string) int {
	hashValue := hashFunction(key)
	return int(hashValue % uint32(hm.capacity))
}

We then can use this function for inserting and retrieving values from the hash map.

func (hm *HashMap) Insert(key string, value any) {
	hm.mutex.Lock()
	defer hm.mutex.Unlock()

	hashResult := hashFunction(key)
	log.Println("Hash result for key:", key, "is", hashResult)
	index := int(hashFunction(key)) % hm.capacity
	log.Println("Index for key:", key, "is", index)

	log.Println("Putting key:", key, "with value:", value, "at index:", index)
	head := hm.buckets[index]

	// Insert new entry
	newEntry := &Entry{
		key:   key,
		value: value,
		next:  head,
	}

	hm.buckets[index] = newEntry
	hm.size++
}

3.Collision Resolution

A collision occurs when two different keys hash to the same index in the hash map. There are multiple ways to handle collisions, but the most common ones are:

  1. Chaining: Store multiple entries in the same bucket by using a linked list in each entry.
  2. Open Addressing: Find another empty bucket for the new entry.

In our case, we will use the chaining method, as it is simple to implement. It has the downside that the performance can degrade if many entries hash to the same bucket, and lookups will take longer.

This means two changes in our current implementation on the Insert and Get methods. In the Insert method we need to check if the key already exists in the bucket and update the value if it does. The magic part happens when we insert a new entry into the bucket. We will set its next pointer to the current head of the bucket. By this the new entry will be the first entry in the linked list, and the previous entries will be accessible via the next pointers.

func (hm *HashMap) Insert(key string, value any) {
	...

	for e := head; e != nil; e = e.next {
		log.Println("Checking entry:", e.key)
		if e.key == key {
			e.value = value
			return
		}
	}

	newEntry := &Entry{
		key:   key,
		value: value,
		next:  head,
	}

	hm.buckets[index] = newEntry
	hm.size++
}

In the Get method we need to iterate over the linked list in the bucket to find the entry with the given key.

func (hm *HashMap) Get(key string) (any, bool) {
	hm.mutex.RLock()
	defer hm.mutex.RUnlock()

	hashResult := hashFunction(key)
	log.Println("Hash result for key:", key, "is", hashResult)
	index := int(hashFunction(key)) % hm.capacity
	log.Println("Getting key:", key, "at index:", index)
	head := hm.buckets[index]

	for e := head; e != nil; e = e.next {
		log.Println("Checking entry:", e.key)
		if e.key == key {
			return e.value, true
		}
	}

	return nil, false
}

4.Resizing the Map

If the current capacity of our map implementation is reached, and we want to insert a new entry, we need to resize the hash map. Based on our earlier decision I'll opt for a simple resizing strategy:

  1. Double the current capacity of the map.
  2. Create a new array of buckets with the new capacity.
  3. Rehash all existing entries and insert them into the new buckets.
  4. Delete the old map.

There are more sophisticated resizing strategies that don't require rehashing all entries, but let's keep it simple for now and let some performance improvements for another post.

func (hm *HashMap) resize() {
	oldBuckets := hm.buckets
	// double the capacity
	hm.capacity *= 2
	hm.buckets = make([]*Entry, hm.capacity)
	hm.size = 0

	for _, entry := range oldBuckets {
		for e := entry; e != nil; e = e.next {
			log.Println("Rehashing key:", e.key, "with value:", e.value)
			hm.Insert(e.key, e.value)
		}
	}
}

Determining when to resize the hash map is pretty straightforward. During insert operations we can check if the current size of the hash map has reached the current capacity and if so, we can call the resize method.

func (hm *HashMap) Insert(key string, value any) {
	if hm.size == hm.capacity {
		log.Println("HashMap is full, resizing...")
		hm.resize()
	}

	...
}

Thread Safety

As of now our implementation is not thread-safe, meaning that if multiple go routines access the hash map concurrently, it can lead to undefined behavior. But we can easily make it thread-safe by using a mutex to lock the hash map during read and write operations. For this we are going to adjust the HashMap struct to include a mutex:

type HashMap struct {
	buckets  []*Entry
	size     int
	capacity int
	mutex    sync.RWMutex
}

Then we have to lock and unlock the mutex in all methods that modify or read the map values. For read only operations we can use a read lock, which allows multiple routines to read the map concurrently. For write operations we use a write lock, which ensures that only one routine can modify the map at a time.

// read operation
hm.mutex.RLock()
defer hm.mutex.RUnlock()

// write operation
hm.mutex.Lock()
defer hm.mutex.Unlock()

Conclusion

In comparison to the new Go implementation of hash tables, our implementation is for sure not as performant, but it is a good example to understand the core concepts of hash tables. Additionally, we covered different hash functions and how to handle collisions. Something we haven't covered or handled at all is hash collisions on keys we want to insert. But this will be a topic of a future post. The examples are inspired by a blog post I read a while ago[4], and as always the full code is available on GitHub.


  1. https://go.dev/blog/swisstable

  2. https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed

  3. https://github.com/rurban/smhasher

  4. https://medium.com/@alameerashraf/from-amazon-interview-jitters-to-mastering-hash-maps-building-a-hashmap-in-go-from-scratch-022e53d47636