Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I get ~12 tps with 16k context

FWIW Ollama at its defaults with qwen3:30b-a3b has 256k context size and does ~27 tokens/sec on pure CPU on a $450 mini PC with AMD Ryzen 9 8945HS. Unless you need a room heater, that GPU isn't pulling its weight.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: