Saturday, May 03, 2014

A Go Hash Map

I was a little disappointed to discover that the built in Go map doesn't allow interfaces as keys. Considering that interfaces are the way to do anything dynamic in Go, and that dynamic stuff often uses maps, it seems a little odd.

To act as a hash table key, you need equals and hash code methods. But it's easy to define an interface for that and require that keys implement it, similar to how Sort requires a container to implement Len, Less, and Swap.

I looked around to see what's out there and found a few options, but none really excited me. And I'm still learning Go, so it made more sense to implement it myself.

My first thought was to port my hash map code from cSuneido. But I wondered what Go's map was like. The code is straightforward but it uses an interesting approach that I haven't encountered before. It's a variant of separate chaining with each slot in the hash table being a bucket that can hold a small number of entries (e.g. 8). Additional overflow buckets can be chained together. In many hash table designs, collisions are a nuisance to be tolerated, but this design almost embraces them, by making the table 8 times smaller you assume collisions.

Buckets holding a number of entries are also better for cache locality than a linked list.

Another interesting feature is that the buckets have an additional byte per entry that holds the high byte of the hash code of the key. This helps in searching because if this piece of the hash code doesn't match then you can avoid comparing keys (which is cache unfriendly and also slow if keys are large or complex).

This design also works well for small tables since you can use a single bucket, which basically reduces it to a small array with linear searching, which is what you want for a few entries.

So I implemented this design in Go, following the C code fairly closely, except that I didn't implement the incremental resizing. It might be worthwhile in some situations, but it makes the code more complex (especially iteration) and probably makes the resizing slightly slower in total, albeit amortized. The lack of incremental resizing hasn't been a noticeable issue in cSuneido.

Have a look, it's about 200 lines of Go.

The next issue was what to use for hashing strings. Go has standard packages for hashing but they require converting to a byte array which requires allocation and copying. (Go 1.3 has an optimization for this, but only for the built in map.) So again, I wrote my own version, following the approach in hash/fnv.

It seems reasonable. The main drawback comes from Go's lack of generics - it has to work in terms of interface{} (the equivalent of Java Object or C/C++ void*) so you have to cast everything that comes out of it, reminiscent of Java prior to generics. Another minor awkwardness is that you can't use tbl[key] syntax like built in maps.

Another hash table approach which would be a natural fit with Go would be to use a growable slice for each entry in the table (rather than a list of buckets). This would avoid the space overhead from chain links and from partially full buckets, at the cost of the slice itself (a pointer and two int's), plus more individual allocations.

Related interesting reading:

No comments: