I really want to try coding with this at 2600 tokens/s (from Cerebras). Imagine ...

andai · 2025-08-06T00:45:41 1754441141

I tried this (gpt-oss-120b with Cerebras) with Roo Code. It repeatedly failed to use the tools correctly, and then I got 429 too many requests. So much for the "as fast as I can think" idea!

I'll have to try again later but it was a bit underwhelming.

The latency also seemed pretty high, not sure why. I think with the latency the throughout ends up not making much difference.

Btw Groq has the 20b model at 4000 TPS but I haven't tried that one.