this turned into a larger rant than I wanted it to be. But I need to let it out every now and then. Feel free tos kip it.
IMHO programming languages are, for the most part, designed such that a compiler's job is easy if you want to compile to scalar code, but damned near impossible if you want it to compile to vectorized code.
So I have a point type, right? 'struct point3f { float x,y,z; };'. Easy peasy lemon squeezy. I add a bunch of member functions for normal stuff like addition, scalar multiplication, dot/cross product, etc. I write a triangle type: 'struct triangle { point3f a,b,c; };'. I write a bunch of functions for geometry stuff, intersections with rays, normals, etc.
Then I make an array of triangles. I have an origin point and a look ray. I want to iterate over each triangle and figure out which triangles intersect my origin/ray. Now I'm stuck. This is a perfect use case for vectorization, I can trivially get 4x/8x/16x speedup with SSE/AVX/AVX512, but the compiler can't do it. The data's in the wrong layout. It's in the correct layout for scalar code, but the wrong format for vector code. If you want to write vector code, your data has to be in a struct of arrays (SoA) layout.
There ought to exist a programming language that will, by default, automagically convert everything to SoA layout, unless you flag your array as AoS or your class as non-SoA-able. And ranged for loops are, by default, unsequenced.
This will make autovectorization an order of magnitude easier, and will enable vectorization on complicated stuff that might be fiendishly difficult to vectorize, even by hand.
This isn't trivial, and it can't simply be tacked on to a language later on. SIMD needs to be a day-0 priority. Everything else needs to be in support of that.
Until then, until this happens, we will either be leaving 60% of our CPU's silicon idle 99% of the time, or scrubs like me will continue to write SIMD intrinsic laden code with lots of manual bullshit to do what the compiler/language design should be doing for me for free.
IMHO programming languages are, for the most part, designed such that a compiler's job is easy if you want to compile to scalar code, but damned near impossible if you want it to compile to vectorized code.
So I have a point type, right? 'struct point3f { float x,y,z; };'. Easy peasy lemon squeezy. I add a bunch of member functions for normal stuff like addition, scalar multiplication, dot/cross product, etc. I write a triangle type: 'struct triangle { point3f a,b,c; };'. I write a bunch of functions for geometry stuff, intersections with rays, normals, etc.
Then I make an array of triangles. I have an origin point and a look ray. I want to iterate over each triangle and figure out which triangles intersect my origin/ray. Now I'm stuck. This is a perfect use case for vectorization, I can trivially get 4x/8x/16x speedup with SSE/AVX/AVX512, but the compiler can't do it. The data's in the wrong layout. It's in the correct layout for scalar code, but the wrong format for vector code. If you want to write vector code, your data has to be in a struct of arrays (SoA) layout.
There ought to exist a programming language that will, by default, automagically convert everything to SoA layout, unless you flag your array as AoS or your class as non-SoA-able. And ranged for loops are, by default, unsequenced.
This will make autovectorization an order of magnitude easier, and will enable vectorization on complicated stuff that might be fiendishly difficult to vectorize, even by hand.
This isn't trivial, and it can't simply be tacked on to a language later on. SIMD needs to be a day-0 priority. Everything else needs to be in support of that.
Until then, until this happens, we will either be leaving 60% of our CPU's silicon idle 99% of the time, or scrubs like me will continue to write SIMD intrinsic laden code with lots of manual bullshit to do what the compiler/language design should be doing for me for free.
Rant over. Thank you for coming to my TED talk.