24 February 2016

I alluded to the issue of generics in my post on What vs How. Generics are a topic that has no real consensus on it, and I don't expect that to change any time soon. Here I want to just briefly lay out why I feel that generics are important for a statically typed language.

Assumption: All values should have a type that is known at compile time.

We're talking about statically typed languages, so this is a given.

Assumption: Functions are values like any other.

This seems to be relatively uncontroversial in recent times. What I mean by values "like any other" is that functions can be created at runtime, passed around, stored in variables, and so on. In particular, this means that functions are also statically typed.

Guiding Principle: Any common code should be able to be factored out as a function.

This is essentially saying that a language should support the Don't Repeat Yourself principle. That's not to say that every time you see common code it should necessarily be separated out, but if you come across a situation where you would want to, and the language you're using doesn't let you, that's an example of the language getting in the way.

Example: map

Let's say that one day you notice that there are a lot of blocks of code that look like this:

result_list := make([]ResultType, len(inputList))
for i, item := range inputList {
    resultList[i] = computeSomething(item)
}

You might decide that this pattern is so common that it should be pulled out as a function.

func map(f ???, inputList ???) ??? {
    resultList := make(???, len(inputList))
    for i, item := range inputList {
        resultList[i] = f(item)
    }
}

How should you fill in the types? Well, let's think about a few things that we should be able to do. We'd like to be able to turn

doubledNumbers := make([]int, len(numbers))
for i, number := range numbers {
    doubledNumbers[i] = 2*number
}

into

doubledNumbers := map(func(number int) int { return 2*number }, numbers)

and also be able to go from

sums := make([]int, len(lists))
for i, list := range lists {
    sum := 0
    for _, x := range list {
        sum += x
    }
    sums[i] = sum
}

to

sums := map(
    func(list []int) int {
        sum := 0
        for _, x := range list {
            sum += x
        }
        return sum
    },
    lists,
)

Having a single map function that can do both of these is only possible if the language has some sort of polymorphism or generic types. There's no universal agreement on whether map itself makes code more or less readable, but it serves as a good example of the type of code that benefits from generics. I personally believe that map makes code more readable, so a language in which I cannot define such a function is a language that is getting in the way of my programming.

Example: Sets

Go doesn't have sets built in, but it has maps and a type with only one value (struct{}). So you can use map[int]struct{} to represent a set of ints, and so on. You might think that if this idea is used repeatedly, that it's worth it to give it the name Set and then have meaningful function calls like s.add(5) instead of writing s[5] = struct{}.

While I was working at Dropbox, this had apparently happened sometime in the past, and so there was a package that had definitions like the code below. Since Go doesn't support generics, in order to allow the set to hold any type of value, it actually holds interface{} values.

type Set struct { underlyingMap map[interface{}]struct{} }

func (s *Set) add(x interface{}) {
    s.underlyingMap[x] = struct{}
}

func (s *Set) remove(x interface{}) {
    delete(s.underlyingMap, x)
}

func (s Set) contains(x interface{}) bool {
    _, ok := s.underlyingMap[x]
    return ok
}

// More functions along the same line

This code works, and it seems like a reasonable idea, but the lack of generics makes using it substantially less safe than the version using map[ValueType]struct{} directly. What ended up happening was that my coworker wrote a piece of code like this (the topic of the code is changed for the purposes of this blog post, but the structure has been preserved):

// knownPlaces is a Set containing all the places that we've seen before.
for place := range places {
    if knownPlaces.contains(place) {
        doStuff(place)
    }
}

And the code didn't work. The conditional always failed. I think at least 3 of us looked at the original code and weren't sure why the conditional wouldn't be firing. I noticed that the code was using Set and considered that a bad idea, so I rewrote the code using map[Place]struct{}, so that the code looked like this:

for place := range places {
    _, ok := knownPlaces[place]
    if ok {
        doStuff(place)
    }
}

and the compiler informed me that place had type int instead of Place, because the code should actually be this:

for _, place := range places {
    _, ok := knownPlaces[place]
    if ok {
        doStuff(place)
    }
}

If the language had generics, then the compiler could have picked up on the error in the first version. So while technically the language didn't prevent us from factoring out the common pattern, we lost the ability of the type checker to catch many of our errors, which caused that factoring to be a net negative, and as a result we ended up with code that makes less sense unless you go in with the set-as-map idea already in mind.