Numba fits very few usecases, but where it does fit it's awesome. I've been usin...

hgibbs · on Feb 19, 2022

Why do you say it fits very few uses cases?

KolenCh · on Feb 19, 2022

Numba can compile in “no Python” mode only with a subset of Python. Eg classes is limited and is still experimental. Also I think string manipulation is slow but the doc has details.

If you want to specify the type (for example for aot or just because you want to make it clear) then the call signature is less flexible.

In short, pick any random Python library, you’d find there are very few places you can jit accelerate something effectively. It is for numeric.

Even for numerical code, it is more like writing C functions than say C++ (with classes etc).

But it does makes accelerating vectorized code very easy. Even if you have a function that uses Numpy, it is likely you can speed it up using Numba with a decorator only.

But when it doesn’t work, it might often be not very clear why you can’t until you get some experience.

korijn · on Feb 19, 2022

The ahead of time compilation output is... Well.. let's say difficult to package _properly_ (compare it to Cython where it's well supported and documented). That makes it useless for production, unless you want to ship giant containers with compilers etc

johndough · on Feb 19, 2022

In theory, a compiler toolchain is not required since Numba already comes with LLVM, i.e. for JIT compilation, no additional compiler is necessary.

In the past, that was also possible for AOT compilation [1], but that technique broke during some update and it seems like there is no one left who knows how to fix this.

[1] https://stackoverflow.com/a/42198101

woadwarrior01 · on Feb 19, 2022

Jax also has experimental support for persisting its JIT cache on the filesystem in the ‘jax.experimental.compilation_cache’ module.

VHRanger · on Feb 19, 2022

How does jax compare to numba?

woadwarrior01 · on Feb 19, 2022

numba is more general. Any change to the shapes of arrays triggers a JIT recompilation in jax, numba is a bit more forgiving. jax has autodiff that numba doesn't. Also, JAX supports TPUs, which numba doesn't support (yet).

Der_Einzige · on Feb 19, 2022

What??? Numba has more usage in the AI/ML community than Cython has ever had by anyone, ever.

"Fits very few use cases" LOL okay without numba there's no UMAP and HDBScan and those are pretty popular and important libraries that come to mind just off the top of my head...

Also, claiming Cython is well documented also gets a huge LOL from me as someone whose actually written a bit of Cython.

cycomanic · on Feb 19, 2022

I have written quite a bit of cython code as well and at least the last time I looked cython was much better documented than numba (it has been a couple of years though so things might have improved on the numba side), and I would agree with the previous poster is generally quite well documented.

korijn · on Feb 19, 2022

Also I am specifically referring to the documentation on creating ahead of time compiled packages using Numba vs Cython, but perhaps that was unclear.

VHRanger · on Feb 19, 2022

FWIW Numba's JIT caches the compiled function as long as you don't call it again with different type signatures (eg. int32[] vs int64[])

I've succesfully deployed numba code in an AWS lambda for instance -- llvmlite takes a lot of your 250mb package budget, but once the lambda is "warm" the jit lag isn't an issue.

That said, if you absolutely want AOT you'll have to use Cython or some horrible hack dumping the compiled function binary.

korijn · on Feb 19, 2022

Exactly!

A-Train · on Feb 19, 2022

You realize that Scikit-learn is written mostly in Cython (where high performance is needed)? It is a part of the most influential ML library in existence.

VHRanger · on Feb 19, 2022

Also pandas

VHRanger · on Feb 19, 2022

You're aware pandas and most of scipy is cython, right?

I like numba, but cython is clearly used more in the popular packages

sockpuppet69 · on Feb 19, 2022

It’s not really gonna be used in your database rest api is it.

CornCobs · on Feb 19, 2022

I assume the parent comment was talking about the context of computations where numba is supposed to be a drop-in for wherever numpy is used.

And I agree that it's not actually usable everywhere, since the support for numpy's feature set is actually quite limited, especially around multidimensional arrays. I had to effectively rewrite my logic to make use of numba. Still it is pretty worth it imo, given how it can add parallelism for free. And conforming to numbas allowed subset of numpy usually results in simpler and more efficient code. In my case I ended up having to work around the lack of support for multidimensional arrays but ended up with a more efficient solution relying on low dimensional arrays being broadcasted, reducing a lot of duplicate computations

henrydark · on Feb 20, 2022

I've had success with numba speeding up code that worked on apache arrow returned by duckdb, which might just go into a rest api