The best part is the native openMP support on for loops IMO. Makes parallelism in data work very efficient compared to python alternatives that use processes (instead of threads)
Numba can compile in “no Python” mode only with a subset of Python. Eg classes is limited and is still experimental. Also I think string manipulation is slow but the doc has details.
If you want to specify the type (for example for aot or just because you want to make it clear) then the call signature is less flexible.
In short, pick any random Python library, you’d find there are very few places you can jit accelerate something effectively. It is for numeric.
Even for numerical code, it is more like writing C functions than say C++ (with classes etc).
But it does makes accelerating vectorized code very easy. Even if you have a function that uses Numpy, it is likely you can speed it up using Numba with a decorator only.
But when it doesn’t work, it might often be not very clear why you can’t until you get some experience.
The ahead of time compilation output is... Well.. let's say difficult to package _properly_ (compare it to Cython where it's well supported and documented). That makes it useless for production, unless you want to ship giant containers with compilers etc
In theory, a compiler toolchain is not required since Numba already comes with LLVM, i.e. for JIT compilation, no additional compiler is necessary.
In the past, that was also possible for AOT compilation [1], but that technique broke during some update and it seems like there is no one left who knows how to fix this.
numba is more general. Any change to the shapes of arrays triggers a JIT recompilation in jax, numba is a bit more forgiving. jax has autodiff that numba doesn't. Also, JAX supports TPUs, which numba doesn't support (yet).
What??? Numba has more usage in the AI/ML community than Cython has ever had by anyone, ever.
"Fits very few use cases" LOL okay without numba there's no UMAP and HDBScan and those are pretty popular and important libraries that come to mind just off the top of my head...
Also, claiming Cython is well documented also gets a huge LOL from me as someone whose actually written a bit of Cython.
I have written quite a bit of cython code as well and at least the last time I looked cython was much better documented than numba (it has been a couple of years though so things might have improved on the numba side), and I would agree with the previous poster is generally quite well documented.
FWIW Numba's JIT caches the compiled function as long as you don't call it again with different type signatures (eg. int32[] vs int64[])
I've succesfully deployed numba code in an AWS lambda for instance -- llvmlite takes a lot of your 250mb package budget, but once the lambda is "warm" the jit lag isn't an issue.
That said, if you absolutely want AOT you'll have to use Cython or some horrible hack dumping the compiled function binary.
You realize that Scikit-learn is written mostly in Cython (where high performance is needed)? It is a part of the most influential ML library in existence.
I assume the parent comment was talking about the context of computations where numba is supposed to be a drop-in for wherever numpy is used.
And I agree that it's not actually usable everywhere, since the support for numpy's feature set is actually quite limited, especially around multidimensional arrays. I had to effectively rewrite my logic to make use of numba. Still it is pretty worth it imo, given how it can add parallelism for free. And conforming to numbas allowed subset of numpy usually results in simpler and more efficient code. In my case I ended up having to work around the lack of support for multidimensional arrays but ended up with a more efficient solution relying on low dimensional arrays being broadcasted, reducing a lot of duplicate computations
I've been using it in a python graph library to write graph traversal routines and it's done me very well: https://github.com/VHRanger/nodevectors
The best part is the native openMP support on for loops IMO. Makes parallelism in data work very efficient compared to python alternatives that use processes (instead of threads)