I really want to try coding with this at 2600 tokens/s (from Cerebras). Imagine generating thousands of lines of code as fast as you can prompt. If it doesn't work who cares, generate another thousand and try again! And at $.69/M tokens it would only cost $6.50 an hour.
I tried this (gpt-oss-120b with Cerebras) with Roo Code. It repeatedly failed to use the tools correctly, and then I got 429 too many requests. So much for the "as fast as I can think" idea!
I'll have to try again later but it was a bit underwhelming.
The latency also seemed pretty high, not sure why. I think with the latency the throughout ends up not making much difference.
Btw Groq has the 20b model at 4000 TPS but I haven't tried that one.