Type inference

georgewfraser · on Nov 16, 2018

For anyone interested in this subject, there's a great minimalistic implementation in OCaml here:

http://okmij.org/ftp/ML/generalization.html

The author starts with a naive implementation similar to the one linked here, then implements a series of optimizations that make it more efficient.

skybrian · on Nov 16, 2018

That's how it works, but I'll point out that just because you can leave out type declarations in a Hindley-Milner language doesn't mean you should.

Compilers are good at deducing types from a function's implementation, but humans will find them easier to understand if they know what the function's inputs and outputs are supposed to be before attempting to figure out how the implementation does it. Error messages will be better, too.

lou1306 · on Nov 16, 2018

On the other hand, from my limited experience in F# (but I guess OCaml would be similar) I found that, when you let the type inference algorithm do its thing, it can help you realize that your function is more general than you thought. For instance, you write a function for lists of strings and in the end it can actually work on any list.

Maybe it's trivial, but I found it helped me thinking about refactoring/reusing my code in contexts I didn't expect.

hinkley · on Nov 16, 2018

Is it that hard to figure out that your method is generic?

And what is this “end” you speak of? You don’t know what next year’s requirements are.

Don’t confuse the implementation with the contract. That’s a variant of duck typing. If you need to do stringy things to the data later don’t advertise genericity. People will start using your function and then you’ll have to make a new method because they own the contract now.

lou1306 · on Nov 17, 2018

> If you need to do stringy things to the data later don’t advertise genericity. [...] People will start using your function and then you’ll have to make a new method because they own the contract now.

I agree that my point doesn't really apply to functions that are part of some kind of API.

Still, type inference is not duck typing: the former happens at compile time, unlike the latter.

> Is it that hard to figure out that your method is generic?

Well, it is not if you decide it to be generic from the start. What I was saying is that sometimes you start with "concrete" implementation, but the type inference helps you thinking about refactoring it to maximize reuse. Something like "hmmm, right now this method has signature 'a -> int -> 'b... I wonder if I can turn it into 'a -> ('b -> 'c) -> 'c and remove this other method that looks pretty similar..."

weberc2 · on Nov 16, 2018

You can still annotate the expression with the general type once you know what that type is. :)

Drup · on Nov 16, 2018

My experience is that people who think that never worked with a language where inference is principal and complete (aka, "good") and/or have poor editor support.

In languages that are still reasonably close to HM and that have decent editor tooling, like OCaml, leaving the type out in the implementation is the common practice. You only write them down in the external API (along with the documentation).

jfoutz · on Nov 16, 2018

There is an experimental phase where you really don’t know. Hygiene indicates you should codify the types after you’ve sorted out the real constraints.

Smaug123 · on Nov 16, 2018

In my mind, this is a bit of an "experimentation smell". It means you're experimenting backwards because you are uncertain of the domain in which you are operating. It's something I've learned to notice as a symptom that I'm flailing around in confusion, and that I need to step back and understand the problem better.

Clarify the domain so that you know the types of everything up front, and your work is more than halved.

wtetzner · on Nov 16, 2018

Except that in my experience you never really understand the problem or domain until you try to build the software.

"Write one to throw away" and all that.

adrianN · on Nov 16, 2018

I find that writing some code is often the best way to clarify the domain and figure out the types.

gameswithgo · on Nov 16, 2018

>Clarify the domain

Could typing out code be a good way to clarify the domain for some people, and/or some of the time?

Waterluvian · on Nov 16, 2018

Got me thinking. Are there tools out there that solidify "auto"s into their types? Type inference is fast but what if I want to eventually lock it down and prevent accidental changes in type by adding a decimal or whatnot.

skybrian · on Nov 16, 2018

An IDE could do it, but the risk for local variables seems smaller compared to the confusion when someone accidentally changes the input or output of a function.

SatvikBeri · on Nov 16, 2018

The Jetbrains IDEs let you do this pretty easily, at least for Python and Scala (and presumably for Java & Typescript.)

SOLAR_FIELDS · on Nov 16, 2018

I definitely agree with this. Too often while learning Scala I’ve found myself scratching my head at some piece of code and wondering how to modify it because the typing system is inferring something I don’t expect it to.

simcop2387 · on Nov 16, 2018

Along with that, the compiler is going to do less work too. That means faster compilation and better programmer experience with the language.

rileymat2 · on Nov 16, 2018

How is it less work for the compiler? It still needs to figure out the return type to make sure it matches.

lifthrasiir · on Nov 16, 2018

Type inference algorithms generally have two operations, verification and inference. Verification is faster and more complete (i.e. never misses an existing suitable type) than inference because it works from the biggest assumption that is a (hopefully) suitable type for given expression. Hindley-Milner combined with Algorithm W is really brilliant because it makes inference as performant as verification as the constraint solver is integrated in the inference process, but it does not work for all type systems; inference continues to be inferior when H-M is not applicable as is.

CuriousSkeptic · on Nov 16, 2018

Or put another way: Compilers are good att figuring out what a function _can_ do. But humans need to specify what it _should_ do.

Types aren’t always the right tool for that though. It would be interesting to see what could be done by integrating testing and typing, letting one inform the other.

gameswithgo · on Nov 16, 2018

An interesting middle ground here is when the editor can annotate the function signature's types for you, with code lens type features. Of course this only works for people who happen to be in such an editor.

Another issue I ran into recently was you can add a new type to your program, and break existing code. Where the type inference was able to deduce the type, and your new type appears to also be compatible, and now things don't compile and it is very mysterious why not!

There can be some interesting big wins by leaving types off when possible though, like super easy refactoring. Just start passing ints to a big chain of functions and you are done. No need to change the signature of a big chain of functions.

muglug · on Nov 16, 2018

This is awesome. Type inference is an endlessly fascinating road to venture down, and each programming language presents its own set of challenges.

I've built a type inference system[0] for PHP and it's incredibly satisfying when it does what you expect.

[0] https://getpsalm.org

legulere · on Nov 16, 2018

The problem with advanced type inference is that the exact algorithm needs to be part of the standard for compability reasons (not just between compilers but also between compiler versions).

So for me it makes more sense to have everything that’s beyond basic in something like a lsp-language server/ide.

gergoerdi · on Nov 16, 2018

> The problem with advanced type inference is that the exact algorithm needs to be part of the standard for compability reasons

I don't think that's true -- if you have principal types, you can just say in your language spec that the principal type is inferred.

lifthrasiir · on Nov 16, 2018

It can be really tedious to list the cases where principal types do not exist. Haskell has a separate section for ambiguous overloading [1] for that reason, and more ad-hoc or advanced type systems will be harder to describe.

[1] https://www.haskell.org/onlinereport/haskell2010/haskellch4....

teajunky · on Nov 16, 2018

Every article on Eli Bendersky's website is just great. If he ever writes a book I would immediately buy one.

abeppu · on Nov 16, 2018

I think he's using the term "bi-directional" in a potentially confusing and nonstandard way. As I understand it, in this area, that term typically describes a style of type checking and inference which arose later, and which involves both checking rules and synthesizing rules (the two directions), and which can be made to support some quite expressive type systems, but generally requires programs to have a sprinkling of explicit annotations.

On that topic, I found these notes to be very helpful: https://www.cs.cmu.edu/~fp/courses/15312-f04/handouts/15-bid...

But as I understand it, the concept started here: http://www.cis.upenn.edu/~bcpierce/papers/lti-toplas.pdf

vilhelm_s · on Nov 16, 2018

Also, I think one should not call it Hindley-Milner. A key clever part of the HM algorithm is how it infers types for polymorphic functions, which this code doesn't do.

For just the idea in this post, I suggest calling it something like "unification-based type inference", or "type inference using equational constraints".

piinbinary · on Nov 16, 2018

In case anyone finds it useful, I wrote an explanation of how type inference with let-polymorphism works:

http://jeremymikkola.com/posts/2018_03_25_understanding_algo...

renox · on Nov 16, 2018

I don't know much about this topic but I've heard that Hindley-Milner's type inference has also drawbacks as it doesn't work well with partial compilation and subtyping.

So does the benefits outweight the drawbacks?

Drup · on Nov 16, 2018

HM type inference works perfectly fine with separate compilation and/or partial files (whichever you mean). See the OCaml tooling for a concrete demosntration.

Subtyping is a fairly large domain, and it really depends what you mean. There are cases where it works fine (again, see OCaml) and other where it's more problematic (see Scala). In any case it's not really HM anymore, it needs to be more.

chombier · on Nov 16, 2018

As far as I know, subtyping with polymorphism looks much better since Stephen Dolan's thesis on Algebraic Subtyping

http://lambda-the-ultimate.org/node/5393

always_good · on Nov 16, 2018

The upside of type inference is that making small changes to types upstream doesn't necessary incur changes to code downstream.

But the downside of type inference is that it makes reading code harder and you may depend on an IDE to know what your intermediate types are. Especially wrt generics and higher-order types where intermediate function calls don't have a concrete type in their signature.

Nothing demonstrates this better than any time you've written code yourself only to immediately invoke the IDE tooling to inspect what the type is. If you didn't know, good luck to the person reading it in git diffs. I've had to git clone someone's Rust project recently just to follow the type transformation across a bunch of future/stream chains.

This isn't an argument against static typing but rather an argument against inference excess. There's a nice middle ground where asserting the concrete type in critical places makes the code more readable but also checks your own assumptions. Like any time you received Maybe<Maybe<Int>> where you wanted Maybe<Int>.

Klathmon · on Nov 16, 2018

A general rule of thumb I follow is to explicitly type your interfaces, and infer private internals.

(I mean "interfaces" not referring to any specific language feature, but to the concept in general).

steveklabnik · on Nov 16, 2018

This is sorta where we landed with Rust; you have to type out type signatures of function signatures, but not inside function bodies. It's not based on public/private lines, but it's a similar idea.

kjeetgill · on Nov 16, 2018

Agreed! That's where Java 10 landed with var. Only local variables can have their types inferred but Fucntion signatures and fields need explicit types.

kazagistar · on Nov 16, 2018

> Nothing demonstrates this better than any time you've written code yourself only to immediately invoke the IDE tooling to inspect what the type is.

Nit saying you arr wrong, but i just eant to point out that this happens even in languages like Java 8 which have no local type inference.

Any time you chain a method from another method, you are skipping a type. Ditto for inferred generic parameters on methods.

On the other hand, if you have strong enough types, it doesn't matter as much... if you don't know what the exact intermediate types are, you can still know it's correct based on the final type and the fact that it compiled, which is enough for a code review imho.

darksaints · on Nov 16, 2018

This, and compiler performance, are my big problems. Unification can be very costly on large codebases. And for this reason, I wish my IDE had a hotkey that would automatically infer and then insert type annotations on an entire file (as opposed to a single statement). Usually, after writing and getting everything working, it's helpful to have everything annotated.

YorkshireSeason · on Nov 16, 2018

   type inference is that it 
   makes reading code harder

I find the opposite to be the case. My syntax isn't polluted with all manner of superfluous syntactic noise like

   A a = new A

How is that more readable than something like the following?

   a = new A

There are edge cases, but type-inferred languages do allow you to add type annotations.

   Especially wrt generics

Why is this a problem? The very essence of parametric polymorphism is that the do "the same thing" at all types. This was first shown with Reynolds' famous Abstraction Theorem [1], which has in the mean time, been generalised in many ways.

[1] J. C. Reynolds, Types, abstraction, and parametric polymorphism.

Klathmon · on Nov 16, 2018

Sure, in that simple case explicit types aren't helping, but take a look at this simple function (it's plain javascript):

    export const mergeItems = (keyExtractor, oldItems, newItems) =>
      uniqBy(oldItems.concat(newItems), keyExtractor)

What is keyExtractor? What are old and new items? You can probably guess that oldItems and newItems are an array of something, and you can probably infer from names and stuff that keyExtractor should extract a key from the items somehow, but aside from that you don't have much. If you didn't know what the lodash function "uniqBy" does, you wouldn't know how to use this without research.

Contrast that with this below (Javascript with Flow type annotations):

    export const mergeItems = <Item> (
      keyExtractor: (value: Item) => string,
      oldItems: Array<Item>,
      newItems: Array<Item>,
    ): Array<Item> => uniqBy(oldItems.concat(newItems), keyExtractor)

With types it is a lot more explicit. Key extractor is given the item in the array, and should return a string key. Now you don't need to know or care about how "uniqBy" works, or the exact details of how you should or shouldn't use it, and if you try to pass 2 arrays with different items, it will yell at you because that's not right, something that type inference can't determine (at least not with Flow's typesystem)

YorkshireSeason · on Nov 16, 2018

   look at this simple function

As I said, type-inferred languages do allow you to add type annotations when they are helpful. Note that your example has two arguments of the same type Array<Item>. In this case your even explicit typing doesn't prevent mixing up oldItems and newItems. In this case you want named parameters ...

tetrep · on Nov 16, 2018

If types are specified in function definitions, then you'll find them in the docs, which you need to be reading in order to properly use a function in the first place.

Finding out type definitions is made massively easier with an IDE, but even without one there's an extremely high chance any library you're using is going to at least have autogenerated docs sitting on disk somewhere or mirrored on a crappy website. Maybe you need to clone their repo and build the docs yourself, but even that's pretty unlikely as most language ecosystems have a package manager + docs website or man pages.

I don't really develop anything without at least a third of my screen real estate dedicated to documentation and I don't see "RTFM" as a meaningful or undesirable barrier to programming correctly.

> Now you don't need to know or care about how "uniqBy" works, or the exact details of how you should or shouldn't use it...

I don't follow you. What if "uniqBy" only works properly on sorted lists? You'd never know that based on the type (unless you've got dependent types) and if you're not reading docs on how the function works (the types check/it compiles!) you're going to be in a world of hurt at runtime. Intuitive programming may be easier but it's a heck of a lot more dangerous unless your compiler is intuitive too (i.e. makes the same intuitive assumptions that you do).

> ... and if you try to pass 2 arrays with different items, it will yell at you because that's not right, something that type inference can't determine (at least not with Flow's typesystem)

That's entirely Flow's fault. There's no reason a "proper" type system couldn't deduce that the same type would be needed for both arrays, the `oldItems.concat()` call should have a type definition something like (functional for brevity): concat(Array<T>, Array<T>). Nothing ambiguous about T being the same type.

Klathmon · on Nov 16, 2018

I'm in complete agreement that RTFM is something that you'll have to do, but us humans aren't good at being constantly vigilant.

Types are an additional layer on top, that can help provide context and information which is often difficult to get across in "plain english" in documentation, and they have the added benefit of being machine readable and verifiable so that a mistake that us humans make or an instance where we forget an edge case is more likely to be caught.

Before I used Flow, the documentation above a function like that would probably look something like this:

    /**
     * [normal docs here about what it does and where it's used]
     * 
     * keyExtractor is a function that will be called for each element in both the
     * new and old array, and it should return a string key that will uniquely
     * identify the element so that the combined array can be de-duplicated.
     */

It's not impossible to write out, and to be honest it probably wouldn't hurt to still add some of that into the function even if it's typed, but it's a lot more writing to get the point across, and it can be ambiguous, can easily become "desynced" with the code itself, and won't show up in IDE tooling when going to use the function (or if it does, it won't be easily parsable like a small type would be).

I'm not saying types let you use a function without knowing anything about what it does or how it's written, but that it makes it easier to just remember the basic ideas and it lets the type system catch some mistakes or problems, while also providing some additional "documentation".

>That's entirely Flow's fault.

Don't worry, I was wrong anyway! Even without typedefs it will error if you pass in arrays of 2 different types. But I believe some type systems will try to infer that the function is generic and can take either, and without explicitly defining the type constraints, will over-genericise (is that a word?) the function to allow it to allow more than the user expected it to.

A good type system in programming is both an additional safety net as well as a guiding hand. It doesn't replace documentation, but augments it, and makes it better, and can bail you out if you get it wrong (sometimes). I once heard types referred to as a "checksum" for your function. It can feel like you are just re-stating the obvious, but the compiler will verify that the implementation and the types are in agreement, and that helps prevent some kinds of bugs.

dllthomas · on Nov 16, 2018

> This isn't an argument against static typing but rather an argument against excess.

And also an argument for exposing better tooling when reading git diffs :D