We use cookies and other tracking technologies to improve your browsing experience on our site, analyze site traffic, and understand where our audience is coming from. To find out more, please read our privacy policy.

By choosing 'I Accept', you consent to our use of cookies and other tracking technologies.

We use cookies and other tracking technologies to improve your browsing experience on our site, analyze site traffic, and understand where our audience is coming from. To find out more, please read our privacy policy.

By choosing 'I Accept', you consent to our use of cookies and other tracking technologies. Less

We use cookies and other tracking technologies... More

Login or register
to apply for this job!

Login or register to start contributing with an article!

Login or register
to see more jobs from this company!

Login or register
to boost this post!

Show some love to the author of this blog by giving their post some rocket fuel 🚀.

Login or register to search for your ideal job!

Login or register to start working on this issue!

Engineers who find a new job through Functional Works average a 15% increase in salary 🚀

Blog hero image

Scala Snacks: Part 1 - Sets are invariantly interesting

Daniel Tattan-Birch 16 August, 2018 | 9 min read

As software engineers we often stumble upon interesting technical curiosities relating to the languages, libraries and infrastructure we employ in our systems. Exploring these novel behaviours can give us a better understanding of the deeper complexities of those technologies. This is the first post of a series which will explore some such informative nuggets we've come across while building our Scala services here at Football Radar. Here's our first Scala snack.

Setting the scene

I was recently attempting to refactor the code around data being consumed and produced by certain APIs between services. We'll use the typical inheritance model of animals and mammals to illustrate the example. Implementation details aren't important.

sealed trait Animal
sealed trait Mammal extends Animal
sealed trait Bird extends Animal

Now let's suppose we're consuming some API which sends us animals in string format, for example some JSON off a Kafka queue. We have a requirement that all animals must be of the same type in the consumed data, and this validation has already been done. The function below deserialises the input.

def deserialiseAnimals(jsonBlob: String, animalType: Int): Set[Animal] = animalType match {
  ...
  case Mammal.id =>
    val mammals: Set[Mammal] = deserialiseMammals(jsonBlob)
    mammals.map(x => mammalWithName(x))
}

Not too nice. Straight off there should probably a wildcard or bound generic type parameter in the signature, for example return a Set[T] where T <: Animal. This would encode the API's requirement at compile-time and avoid the pattern matching. We'll come back to that at the end. The Object in which this resides is bloated due to many similar deserialisation functions. One thing we could do is move functions to the companion objects of the animals, for example:

case object Mammal {
  def deserialise(jsonBlob: String): Set[Mammal] = { ... }
}

We would then have something like

def deserialiseAnimals(jsonBlob: String, animalType: Int): Set[Animal] = animalType match {
  ...
  case Mammal.id => Mammal.deserialise(jsonBlob)
}

You may or may not be surprised that this doesn't compile. Interesting! There are two independent questions here: why doesn't it compile now, and why did it compile before?

Why doesn't it compile?

Sets in Scala are invariant. Specifically, the signature of the Set trait is defined here as trait Set[A]. This means that, using Scala type parameter syntax, if A >: B holds then both Set[A] >: Set[B] and Set[B] >: Set[A] do not hold. Hence we cannot return a Set[Mammal] or Set[Bird] from this function. Why is that? The developers of Scala rarely do anything without good reason. I dug a bit deeper.

The Scala List[+A] class is covariant. This seems intuitive, since we expect to be able to assign a List[Mammal] to List[Animal]. Well, maybe that claim itself is debatable but I'll leave that one for another time. On the REPL:

scala> trait Animal
defined trait Animal

scala> trait Mammal extends Animal
defined trait Mammal

scala> val animals: List[Animal] = List[Mammal](new Mammal{})
animals: List[Animal] = List($anon$1@73b10975)

Actually this is only safe because Scala lists are immutable. Hence if we try to do the following, we get an error.

scala> trait Bird extends Animal
defined trait Bird

scala> animals(0) = new Bird {}
<console>:14: error: value update is not a member of List[Animal]
       animals(0) = new Bird {}
       ^

Well that's a relief! If that weren't the case we could end up trying to wedge a Bird into a List[Mammal] and it would only blow up at runtime. Scala does provide a MutableList as well, but that's invariant so the compiler will stop us:

scala> import scala.collection.mutable._
import scala.collection.mutable._

scala> val animals: MutableList[Animal] = new MutableList[Mammal]()
<console>:22: error: type mismatch;
 found   : scala.collection.mutable.MutableList[Mammal]
 required: scala.collection.mutable.MutableList[Animal]
Note: Mammal <: Animal, but class MutableList is invariant in type A.

We're covered either way. This is a pitfall of Java and C# arrays. They are both covariant, for historical reasons. Lists in both languages are invariant, because they are not immutable (by default). Put succinctly by C# guru Jon Skeet:

No, a List<Dog> is not a List<Animal>. Consider what you can do with a List<Animal> - you can add any animal to it... including a cat. Now, can you logically add a cat to a litter of puppies? Absolutely not.

I digress. So why are sets in Scala invariant, despite being immutable? Martin Odersky assigns the invariance of sets to their implementation details:

On the issue of sets, I believe the non-variance stems also from the implementations. Common sets are implemented as hashtables, which are non-variant arrays of the key type. I agree it's a slightly annoying irregularity.

That's sad, but probably true. An abstraction being defined based on common implementations.

I prefer to believe there could be deeper reasoning. I like this justification. We can think of a Set[A] as simply a mapping - for every candidate element A, is the element present in the set or not? Here's an interesting question on that topic. This representation of a set treats it as a characteristic (indicator) function. I'm on board with that notion.

So in explicit terms a Set[A] can be thought of as a Scala function of type Function[A, Boolean]. It turns out scala.collection.Set extends (A => Boolean) so it literally is one of those! Clearly I'm not the only one who likes the idea of sets as functions.

This stackoverflow answer does, however, skirt over something I had difficulty convincing myself of:

due to the contravariance of functions

Specifically, functions are contravariant in their input parameters. This might be obvious to some - it's something we take advantage of every day. Indeed the Scala doc of function types indicate as much. I'll leave that discussion for another time. For now, let's take it as a given. To state what that means explicitly: if A >: B then Function[A, Boolean] <: Function[B, Boolean].

I think we could now construct a pseudo-proof of Scala Set invariance. Forgive my questionable use of symbols in the following:

Let A >: B. Treating sets as functions, we can say Set[A] = Function[A, Boolean] and Set[B] = Function[B, Boolean][1]. But since functions are contravariant, it follows that Set[A] <: Set[B]. This means sets are at best contravariant (at best meaning they could also be invariant, since subtyping is an identity relation[2]). Now we can make an assumption that sets are covariant, that is Set[A] >: Set[B]. This is a contradiction, since covariance and contravariance are mutually exclusive. We've proved by contradiction that sets cannot be covariant. What about contravariance? This also seems counterintuitive in a data structure sense. It means we could make the assignment val bs: Set[B] = Set[A](new C{}) but when we try to read out the element as a B we would get a class cast exception. We can reinforce this intuition more rigorously - a set has the requirement of implementing Iterable which is covariant. This rules out contravariance. The best we can settle for is invariance.

Ok, now I'm convinced that Scala sets (and set implementations in general) should probably be invariant. In terms of the animal example, as mentioned previously a nice solution might be to use an explicit bound type parameter to enforce the requirement of a single type of animal. The signature would be - def deserialiseAnimals[T <: Animal](...): Set[T]. That's assuming we can't restructure the API in a better way, which was impractical in this particular case.

Anyway, onwards and upwards.

Why did it compile before?

Recall that we had:

    val mammals: Set[Mammal] = deserialiseMammals(jsonBlob)
    mammals.map(x => mammalWithName(x))

When I refactored this code, it no longer compiled. What's the difference? The map function seems to be the only difference. To the REPL:

scala> val animal: Set[Animal] = Set[Mammal](new Mammal{})
<console>:13: error: type mismatch;
 found   : scala.collection.immutable.Set[Mammal]
 required: Set[Animal]
Note: Mammal <: Animal, but trait Set is invariant in type A.

Ok fair, we expect that now. We know Scala sets are invariant. What about:

scala> val animal: Set[Animal] = Set[Mammal](new Mammal{}).map(x => x)
animal: Set[Animal] = Set($anon$1@6793f752)

Aha! The map function appears to "break" the invariance of sets in Scala. Those better versed in Scala than I will immediately know the reason for this. Let's take a look at the signature the map used by Set:

 def map[B, That](f: A => B)(implicit bf: CanBuildFrom[This, B, That]): That

A cheeky implicit. This blog post by a Scala library developer gives a nice overview of how CanBuildFrom works, and why they're engineering it out of Scala 2.13. In summary, it provides a way for the compiler to "choose" which type to return from a function.

In our case, A is Mammal, B is Mammal, This is Set[Mammal] and That is Set[Animal]. Our function f is of type (Mammal => Mammal). Here's the source code for that map function. The function f is iteratively applied to get us some Bs and the bf creates a builder which takes those elements of type B and builds a That out of them. Let's convince ourselves that this is what's actually happening in the REPL - you never know with implicits.

First we need to get our hands on a CanBuildFrom instance. It's not totally obvious where the compiler is finding the in-scope implicit bf. After a bit of digging it can be traced up through the Set's trait hierarchy. We end up finding an instance exposed through a public function in scala.collection.generic.GenTraversableFactory[3]:

def ReusableCBF: GenericCanBuildFrom[Nothing] = ...

This is used in various places to create CanBuildFrom instances, for example on the GenTraversable companion object[4]:

implicit def canBuildFrom[A] = ReusableCBF.asInstanceOf[GenericCanBuildFrom[A]]

This looks like what we need in the REPL. Let's try to hack an instance together and explicitly call map with the CanBuildFrom instance:

scala> import scala.collection._
import scala.collection._

scala> import scala.collection.generic._
import scala.collection.generic._

scala> val animalCbf = GenIterable.ReusableCBF.asInstanceOf[CanBuildFrom[Set[Mammal], Mammal, Set[Animal]]]
animalCbf: scala.collection.generic.CanBuildFrom[scala.collection.Set[Mammal],Mammal,scala.collection.Set[Animal]] = scala.collection.generic.GenTraversableFactory$$anon$1@1132baa3

scala> val animals: Set[Animal] = Set[Mammal](new Mammal {}).map(x => x)(animalCbf)
animals: scala.collection.Set[Animal] = Set($anon$1@7d15c513)

Nice. Presumably that's similar to what's happening at runtime in the real code, except that the implicit is resolved by the compiler beforehand. Now let's fudge the CanBuildFrom by changing the That to Set[Mammal]:

scala> val mammalCbf = GenIterable.ReusableCBF.asInstanceOf[CanBuildFrom[Set[Mammal], Mammal, Set[Mammal]]]
mammalCbf: scala.collection.generic.CanBuildFrom[scala.collection.Set[Mammal],Mammal,scala.collection.Set[Mammal]] = scala.collection.generic.GenTraversableFactory$$anon$1@1132baa3

scala> val animals: Set[Animal] = Set[Mammal](new Mammal {}).map(x => x)(mammalCbf)
<console>:26: error: type mismatch;
 found   : scala.collection.generic.CanBuildFrom[scala.collection.Set[Mammal],Mammal,scala.collection.Set[Mammal]]
 required: scala.collection.generic.CanBuildFrom[scala.collection.Set[Mammal],Mammal,scala.collection.Set[Animal]]

That proves it! The reason the code compiled before is that the compiler was resolving an implicit CanBuildFrom variable somewhere in scope, which enabled it to choose the parent type Animal in place of its subtype Mammal. If we explicitly call map with a different CanBuildFrom the code doesn't compile.

It's interesting to get a glimpse into what the Scala compiler does behind the scenes. I'd be curious to know whether the author of the code I was refactoring knew that there was an implicit being pulled in, without which their code would not compile. As of Scala 2.13, code like that will continue to compile, but the means by which it's achieved will be completely different. I'll point you again to this blog post for more details on that subject.

Well, that was fun. Forgive me that this particular Scala snack was not so much a snack as a hearty breakfast. At least I learnt some things.

Here are a couple more of our Scala blogs for those interested:

Citations:

[1]: Probably what this "=" really means is something like sets and functions are [isomorphic](https://en.wikipedia.org/wiki/Isomorphism), meaning that there's a bijection between the set of all sets and the set of all functions. For each set there exists a function that can entirely describe the contents of the set, and vice versa. Any proper Mathematicians out there can validate whether that statement is remotely correct! [2]: For example see [this](http://www.cs.cornell.edu/Info/People/sfa/Nuprl/NuprlPrimitives/Xpolymorphism_doc.html) formal mathematical definition. [3]: Source code: https://github.com/scala/scala/blob/v2.12.1/src/library/scala/collection/generic/GenTraversableFactory.scala#L45 [4]: Source code: https://github.com/scala/scala/blob/v2.12.1/src/library/scala/collection/GenTraversable.scala#L31

Originally published on engineering.footballradar.com

Related Jobs

Related Issues

WorksHub / client
  • Open
  • 0
  • 0
  • Intermediate
  • Clojure
  • $50
WorksHub / client
WorksHub / client
  • Started
  • 0
  • 4
  • Intermediate
  • Clojure
  • $100
WorksHub / client
  • 1
  • 0
  • Intermediate
  • Clojure
WorksHub / client
  • 1
  • 0
  • Intermediate
  • Clojure
WorksHub / client
  • 1
  • 0
  • Intermediate
  • Clojure
WorksHub / client
  • Open
  • 0
  • 0
  • Intermediate
  • Clojure
cosmwasm / wasmd
  • 1
  • 2
  • Intermediate
  • Go
cosmwasm / wasmd
  • Started
  • 0
  • 1
  • Intermediate
  • Go
cosmwasm / wasmd
  • Started
  • 0
  • 1
  • Intermediate
  • Go

Get hired!

Sign up now and apply for roles at companies that interest you.

Engineers who find a new job through Functional Works average a 15% increase in salary.

Get Started with