Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You might already be able to get good acceleration with SSSE3 or AVX2 or NEON, which also has a 4-bit-input permutation instruction. The problem is that you're doing parallel lookup into many different tables, whereas NEON/SSSE3's lookups are 16x in parallel into the same table (and AVX2 is two copies of the SSSE3 one in parallel I think). So it's not as useful unless you're simulating the same grid on several different inputs for bulk testing. It might still be faster than scalar but I'm not sure.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: