Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Sum Types in Julia and Rust (andreaskroepelin.de)
123 points by xiaodai on Aug 31, 2020 | hide | past | favorite | 66 comments


I may be wrong, but I don't think this is what people mean by 'sum types'.

I was expecting union types (e.g., https://www.typescriptlang.org/docs/handbook/unions-and-inte... or https://dotty.epfl.ch/docs/reference/new-types/union-types.h... )

I guess enums are a type of sum type, but seems this is more specifically about enums


Simple way to think about it is plus. Sum plus sum increases, union plus union does nothing.

1. A union type is like int | double. It means the value is one of a set of possible types. And it's a true set: `int | int | double` is indistinguishable from `int | double`.

2. A sum type is a new type built from a list of other types, which assigns a 'tag' to each possible list element, like a Rust enum. The tags are not themselves types: they are like a struct field name. It's quite possible to have a sum type with N tags, each wrapping `int`.

So with this understanding Rust has sum types but not union types, while TypeScript has union types but not sum types.


It is possible (and popular) to have sum types in TypeScript by manually adding a tag, though.

  type Maybe<T> =
    | { kind: "Some", value: T }
    | { kind: "None" }


This is also why sum types are often referred to as tagged unions. They are a special case of a union, in Rust's case with support from the type system.


Yep, Typescript's control flow analysis, type literals, and union types allow us have tagged unions/sum types. Probably its best feature.


I've always considered both the union and tagged union to be Sum types[0].

> Two common classes of algebraic types are product types (i.e., tuples and records) and sum types (i.e., tagged or disjoint unions, coproduct types or variant types).

Edit: e.g. X | Y isn't so different from 'a' A | 'b' B

[0] https://en.wikipedia.org/wiki/Algebraic_data_type


> X | Y isn't so different from 'a' A | 'b' B

Except for the fact that the union of a type with itself (X | X) doesn't add any extra values, while a tagged union can have distinct tags with the same data type ('a' A | 'b' A) and actually sums the possible values for each tag. They are equivalent only so long as the unions are disjoint with no overlap between the members.


I think we're saying the same thing. I was saying it's like thinking of 'a' A as an X and 'b' B as a Y.


TypeScript does have tagged unions in addition to untagged ones https://www.typescriptlang.org/docs/handbook/release-notes/t...


Rust also has 'union' types [1] that work similar to the union types in C, C++.

Recently there has also been a lot of discussion [2] about possibly adding anonymous enums to the language, and whether these should have sum or union semantics.

[1]https://doc.rust-lang.org/reference/items/unions.html

[2]https://internals.rust-lang.org/t/ideas-around-anonymous-enu...


WRT to 2, one useful property is that you can have two or more labels for the anonymous type `()`, such as `enum MyBool { True, False }` or `enum JSON { List(Array<JSON>), Object(List<(String, JSON)>), Number(f64), String(String), True, False, Null }`.


Thanks! This clarifies the situation


A sum type is a tagged union [1]. In C, a sum type can be modeled by the combination of an enum and a union. The enum would tell you which field of the union you should use. In Rust, enum's subsume unions and enums of C. This is analogous to a a disjoint union [2] in set theory. Notice that in both cases you have a pair of data. Sum types and disjoint unions are both instances of coproducts in category theory, by the way.

1. https://en.wikipedia.org/wiki/Tagged_union

2. https://en.wikipedia.org/wiki/Disjoint_union


According to the tagged union Wikipedia you referenced, a disjoint union is also considered a tagged union. Whether or not that's actually a correct description within the vernacular of computer science, I don't know.

"In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type or coproduct, ..."


They're all what you call isomorphic to eachother, which means roughly any adequate mathematical model of one can mapped to any other.


I believe another word for enums (in the Rust sense, not in the C++ or Java sense) are tagged unions.


That's right. It's unfortunate that Rust calls them "enums", since as you noted that term already had a well-established meaning in other languages, and the concept that Rust calls "enums" also had several existing names (sum type, coproduct type, disjoint union, and, more generally, algebraic data type) in the literature and in prior languages.

Language designers: stop changing the meanings of words! I know you mean well and you're trying to tie unfamiliar ideas to familiar ones for beginners, but you end up causing more confusion for everyone in the long run.


Rust enums are also enums:

  enum Colour {
    Red,
    Blue,
    Green,
  }
is fine.


When you say "Rust enums are also enums", that's not true in general. What is true is that _some_ Rust "enums" (like your `Colour` example) can be represented as traditional enums. You don't prove a universal quantification with a single example.


All enums are expressible as Rust enums.


More than that:

    #[derive(Debug)]
    enum Colour {
        Red = 1,
        Blue = 2,
        Green = 4,
    }

    fn main() {
        println!("{:?}", Colour::Blue);
        println!("{}", Colour::Blue as u8);
    }


> It's unfortunate that Rust calls them "enums"

Rust calls them enum to be familiar to people coming from C-family languages, which is a large target.

> since as you noted that term already had a well-established meaning in other languages

Rust enums "degenerate" to a C-style enum (except typesafe) as it can be repr'd to a number and it's possible to select the discriminant (if there's no associated data).

> the concept that Rust calls "enums" also had several existing names (sum type, coproduct type, disjoint union, and, more generally, algebraic data type) in the literature and in prior languages.

Pretty much none of which are actually part of the language e.g. in Haskell or OCaml the designator is `type`, and it's used for both sum types and product types: the sum type simply has a single constructor.

But Rust doesn't use `type`, it uses `struct`. And a `struct` with multiple variants doesn't make sense.


Can't edit but

> the sum type simply has a single constructor.

this should be "product" not "sum"


Enum has one meaning in C family languages. Other languages may have different meanings. For example, in Haskell something is an enum if there is an invertible mapping between that type and (a contiguous subset of) the machine integers (with complications).

In other languages the concept might correspond to (finite) recursively enumerable sets (ie you can list all their elements).


> I may be wrong, but I don't think this is what people mean by 'sum types'.

It is.

> I was expecting union types (e.g., https://www.typescriptlang.org/docs/handbook/unions-and-inte.... or https://dotty.epfl.ch/docs/reference/new-types/union-types.h.... )

Union types are basically an anonymous form of sum types. In the same way tuples and structs / records are both forms of product types.

Statically typed languages which support sum types generally only support the named version (because it tends to be more useful and powerful).

> I guess enums are a type of sum type, but seems this is more specifically about enums

Depends on the language, C's enums are not types at all, Java's or C++'s are product types. Technically enums (or more generally tagged unions, which is what Rust's enums are) is a superset of sum types as you can have multiple variants of the same type, but that's not leveraged here.


> Union types are basically an anonymous form of sum types.

This is false on multiple levels. First, being a sum type has nothing to do with the type being named vs. anonymous. What makes a sum type a sum type is that it's a categorical coproduct, whether you give it a name or not. Second, sum types are synonymous with _disjoint_ (i.e., tagged) unions, not unions. Consider the union of boolean with itself. The result would be equivalent to boolean, because union is an idempotent operation. Disjoint union, or sum, would give you a type with 4 values instead of 2.

> Technically enums (or more generally tagged unions, which is what Rust's enums are) is a superset of sum types as you can have multiple variants of the same type

That just how sum types work (have you ever wondered why they are called "sum" in the first place?). You're thinking of union types.


Union types aren't sum types. One difference is A | A = A, but A + A = 2 * A.


Rust enums are full union types. i.e. while each possibility in a (say) Java enum must be of the same type, each possibility in a Rust enum can be of a different type.

i.e.

https://play.rust-lang.org/?version=stable&mode=debug&editio...


However, Rust's enum-type members are still named - to my knowledge you can't have anonymous enum type members in Rust.

In the example you gave, the type-of `left` is still `SumType::Left` instead of being just `String`.

I'm not too familiar with Rust to say, but I don't consider this to be a syntactically zero-cost abstraction (even if the wrapper-types are elided by the compiler) because we still have more keyboard typing to do than we should be doing, imo.


Ah, that's to enable you to have multiple members that could hold `String` values. Yeah, it comes with the (syntactic) cost that you have to identify the member.


In practice that's really not a problem because of rust's great pattern matching that easily lets you extract even deeply nested values.


I think the use of an enum (in the Rust example) makes this a sum type situation: https://tonyarcieri.com/a-quick-tour-of-rusts-type-system-pa...


This is just a difference between named and anonymous types, really. This is very similar to named functions vs lambda functions, both provide the same exact functionality but avoiding superfluous identifiers makes programming more ergonomic.

More specifically sum types in Rust must also be tagged, meaning you need to explicitly construct and deconstruct them. This aspect is along the type-alias vs newtype axis which gets into structural vs nominative typing and the trade-offs therein.

So yes, Rust enums and Typescript union types are both exactly sum-types and the differences are due to the surrounding decisions made about the languages. Rust's sum types are named and tagged, Typescript's are anonymous and untagged, but they're both sum types.


> So yes, Rust enums and Typescript union types are both exactly sum-types…

As others have pointed out, untagged unions are not sum types because the union of a type with itself has no effect, whereas adding a type to itself yields twice as many possible values. Untagged unions can function as sum types when there is no overlap between the members, but not in the general case.

To illustrate the difference: You can construct every possible algebraic type as some combination of void (no values), unit (one value), sum (|A + B| = |A| + |B|), and product (|A * B| = |A| * |B|). This does not work if the sum type is replaced with an untagged union. You can't even get as far as constructing the equivalent of the boolean type; while |Unit + Unit| has two distinct values, |Unit ⋃ Unit| only has one value.


Even the word "enum" is quite overloaded; i.e. compare Swift's robust class-like enums with those of C or C++.


I guess technically what people want us algebraic types. Sums of products. Just sums alone are just enums.


When I think of sum types, I like the categorical definition the best, which is that a sum A+B has two morphisms (i.e. constructors) Inl : A -> A+B and Inr : B -> A+B, with a simple commuting diagram[0]. Or in Rust,

  enum Sum<A, B> {
     Inl(A),
     Inr(B),
  }
Why do I prefer this definition? Well, category theory abstracts away irrelevant details, and sums have a "universal property" associated with them. Roughly speaking that means that it doesn't matter how you define sum types in your language, if they fit the universal property of sums (up to isomorphism) then they truly can be considered sum types. In the Rust PlayerClass example the corresponding sum is (Solarian + (Polarian + Centaurian)), and morphisms

  Sol  = Inl . Inl
  Pol  = Inl . Inr
  Cent = Inr . Inr
[0] https://en.wikipedia.org/wiki/Coproduct


I don't know any category theory whatsoever. What happens in your example if A = B? Would it have two morphisms or just one?


What might be causing confusion is the difference between tagged unions and untagged unions.

A union type "A | B" means "a value of type A or a value of type B". Example:

    function f1(): String | Integer {
        if (rand()) {
            return "hello"
        } else {
            return 12
        }
    }

    function f2(x: String | Integer) {
        switch (typeof x) {
            case String: return "string: " + x
            case Integer: return "integer: " + x
        }
    }
The type "String | String" is exactly equivalent to "String".

A tagged union (aka sum type) "A + B" means "either a left value or a right value; if it's the left, it has type A, if it's the right it has type B".

    function g1(): String + Integer {
        if (rand()) {
            return Inl("hello")
        } else {
            return Inr("bye")
        }
    }

    function g2(x: String + String) {
        switch (x) {
            case Inl(s): return "left value: " + x
            case Inr(s): return "right value: " + x
        }
    }
The type "String + String" has one bit of additional information than just "String".


There'll still be two morphisms, Inl : A -> A + A and Inr : A -> A + A


> Use subtyping in Julia and enums in Rust.

I would use enums in Julia, making the enum you define a field of the PlayerClass:

  @enum PlayerStar Sol Pol Cent
  struct PlayerClass
      star::PlayerStar
      # other fields
  end
Julia's compiler will split small unions into static dispatches behind branches, and can also decide whether or not to specialize on types or create a generic functions.

But if your code is performance sensitive, I'm more comfortable controlling the behavior than relying on these optimizations. The problem is, dispatch is often a more convenient coding style than long branches.


  fn greet<T: Player>(&self, other: &T);
Yeah, this doesn't work, but the equivalent to what you are writing in Julia does:

  fn greet(&self, other: Box<dyn Player>);


I’m not sure what is meant here, because that top example would compile.

edit - Alright I think I see what the original article was trying to express. In the original article that top example method was defined as part of a trait. Because that method is generic that would make the trait not object safe. On its own that is fine (this would still compile), but there was some previous example code which relied on this trait being object safe.


Or even just pass by reference if you don't need to take the box:

    fn greet(&self, other: &dyn Player);


Great article — I’ve never written a line of Julia or Rust, but could still understand it perfectly


Two thoughts:

For this problem, I would use union types in Julia. Union types are a sort of sum, but they are amalgamated sums whilst sums types in PL semantics usually means disjoint sums. The difference is that disjoint sums 'mark' whether a value is of the left or right type, while with amalgamated sums the value may belong unmarked to the intersection. The distinction does not matter in the example the post gives.

Second, Julia does not give the benefit that Rust gives of type coverage, that is, ensuring that functions that take the sum type as argument actually are defined for each branch of the sum type. The Rust compiler guarantees this automatically. AFAICS, with Julia it is up to the user to provide tests exploring the branches.


Julia is a dynamic language, I don't think having the rigor of a compiled language (probably the most rigorous out there) is a reasonable expectation. It would be nice to have as an external linter service, probably.


I think you are mixing two different things.

Julia is a compiled language and Rust is too. Julia is dynamically typed while Rust is statically typed.


It is possible to check type coverage for dynamically typed languages, if the compiler/interpreter has enough type information to infer what types need coverage. In the case of multimethods, Julia's compiler is given this information.


Traits are possible in Julia [0] and may be a better solution.

[0] https://ahsmart.com/pub/holy-traits-design-patterns-and-best...


> Use subtyping in Julia and enums in Rust. One caveat though: Enums can not be extended from the “outside”. If you plan to let users of your library add new type variants, you will have to use some kind of trait approach.

This extensibility from the outside is the most important idea behind Julia!

For example this is why a generic deep learning library doesn't need any extra implementation details to be able to run on GPU as well. https://fluxml.ai/Flux.jl/stable/gpu/


Someone Rustier than I am, what's the story with this enum solution when I want to add a _new_ star later? Can one implement this in a way that respects the open-closed principle?


You can flag an enum as `non_exhaustive` to indicate that it might contain additional values in the future.

If you do that, then anyone writing the code must add code to handle other cases, or it's a compile error (i.e. you must match the enum as:

  #[non_exhaustive]
  enum Foo {
      CaseA,
      CaseB
  }

  let f : Foo = get_foo();

  match foo {
      CaseA => {},
      CaseB => {},
      // Exhaustive matching
      _ => {},
  }
https://doc.rust-lang.org/reference/attributes/type_system.h...


But how can the PlayerClass trait methods get extended in the future without opening them back up to add cases to the match expression?


I'm slightly confused. First, PlayerClass is not a trait. And the methods would have to be edited when you add a case to the enum. Is that what you're asking?

It's different than if you were implementing those methods on a trait and just had several types that implemented the trait. But the article does show the potential complications that sometimes arise with Rust traits.

The difference is that a trait is "open" and an enum is "closed". So the enum's methods can/must account for all variants. A trait method calls to the implementor for work, much like inheritance in other languages.


Sorry, somehow my brain had thought impl and trait went hand in hand, I see that's incorrect.

My question is just - do people really accept this tradeoff? The Rust advice in the article seems to create a maintenance burden compared to the Julia version. Every time you want to extend the system you have to go back and crack open old code. Is the performance difference really that great? Is this really the default approach for Rust programmers? Not snarking about Rust at all, I'm fascinated, but the open-closed principle would be front of mind for me when deciding on these sorts of abstractions (both as a language designer and just a developer).


Your concerns are not unwarranted. I've done several Rust projects and usually there really isn't any trade-off.

If you want something open for extension, use a trait unless you can prove to yourself that you "can't". If it's closed or unlikely to be extended, use an enum.

Think of it this way: In Rust you have a choice between open and closed (trait vs enum). In Java, you get open (interface) and that's it. Other languages do also have both, like Rust: Swift, Kotlin, apparently Julia.

Traits have a few sharp edges because of the nature of Rust being built around zero-cost abstractions. But, most of the time, they work just like an interface in Java. The snag I most often hit is if one wants to return Self in a trait method. The most obvious way to work around that, IIRC, is to just return a Box<dyn MyTrait>. Keep in mind that in many languages, these things actually work similarly except that the language will just happily throw stuff on the heap transparently. Rust requires you to acknowledge that the former case needs to be boxed and likely on the heap, since you can't know its size at compile time.

In this particular case of extending an enum, I don't see it nearly as horrible as you're describing it. Let's look at it from the POV of another implementation in another language (I don't know Julia): You have an interface, Player. That interface defines a couple of methods. You implement three classes that conform to that interface. Now, six months later you come back and want to add a fourth. You make a new class, say that it implements Player and then your IDE/compiler yells at you until you implement those methods. In the Rust-enum version, you have an enum with three variants and a couple of methods, each of which has to match on the variants. Six months later, you want to add a fourth PlayerClass variant. So, you add it and then your compiler yells at you until you implement the fourth match branch in the couple of methods on the class.

That doesn't seem that different.

To answer some of your inner questions: Yes, Rust devs reach for enums a lot. But really only if you don't expect the cases to change much. Writing version 1.0 in Rust usually takes longer than in other languages because Rust is still a low-level language. Rust's performance advantage doesn't matter for most applications. People choose Rust for much more than the performance: its error philosophy, its enums and good matching, etc.


This is an excellent answer, thank you. I suppose the only remaining discomfort I anticipate is if you have an internal abstraction that might one day be exposed externally in a library. That refactoring doesn’t seem fun, but I suppose these things rarely sneak up on you.


> I suppose the only remaining discomfort I anticipate is if you have an internal abstraction that might one day be exposed externally in a library. That refactoring doesn’t seem fun, but I suppose these things rarely sneak up on you.

The place where this ends up happening most is around error handling. Crafting your public error types in Rust is an art form. Usually in Rust libraries, errors are implemented in enums. So whoever calls your API can see that it failed and then can match on the reasons it may have failed (or just wrap it in their own error type and bubble it up). If you're not careful, you can cause breaking changes in your API by doing something as simple as changing a dependency (your dep's concrete error type was wrapped inside one of your error variants).

I have somewhat mixed feelings on it, but Rust does allow us to mark enums with a special annotation that forces all matches on it to include a wildcard match (even if it matches all current variants explicitly). It is generally considered good practice to mark your error enums with said annotation so that you can add failure modes in the future without requiring a major version bump for just an extra error case. One could use the same annotation for any enum, of course.


Julia enums do not equal to Rust enums. You can achieve that functionality (that a type can be one of a set of cases, type algebra in general) with subtypes. Which is fine I'd say, but there could be terser syntax for it (which is available through packages, and it's slower). Having abstract types which cannot be instantiated is also nice.


There are sum types, and product types, but what would "exponentiation types" look like?


Exponential types are functions [1]. The type A -> B could be written B^A. Interestingly, if you take functions to be lookup tables, then that's related to the number of possible unique lookup tables. First, notice that the type A + B has |A| + |B| values, the type A × B has |A| + |B| values and then we can see that A -> B has |B|^|A| values by remembering that functions can be seen as binary relations with the property of being functional. This means that there are |B|^|A| possible unique functions. Try an example by counting the number of functions from the set { 0, 1 } to { 2, 3, 4 }. You will see that there are 3^2 = 9 possible unique functions.

1. https://en.wikipedia.org/wiki/Function_type


Interesting! Now I wonder what "tetrated" types correspond to [1]

[1] https://en.wikipedia.org/wiki/Tetration


Well, I don't know of any type theory that has that but you could make something up. C^(B^A) = (B^A) -> C = (A -> B) -> C. So you'd get nested higher-order monomorphic functions like if the base is A and the height |B| = 5 then the result of tetration would be (((A -> A) -> A) -> A) -> A. I don't really know what that means though. Maybe an expert Haskeller could recognize a pattern.


Fixed-length sequences, but they would be constructed from a type and a non-negative integer, not from two types.

The sum of a type with m possible values and a type with n possible values has (m+n) possible values, the product of a type with m possible values and a type with n possible values has (m×n) possible values, a sequence of n items of a type with m possible values has m^n possible values.


Those are just (pure) function types.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: