The problem is that input token cost dominates output token cost for the majorit...

energy123 · 2025-08-01T17:32:18 1754069538

No, you're talking about costs to user, which are oversimplifications of the costs that providers bear. One output token with a million input tokens is incredibly cheap for providers

danenania · 2025-08-01T18:06:51 1754071611

> One output token with a million input tokens is incredibly cheap for providers

Source? Afaik this is incorrect.

danielbln · 2025-08-01T19:06:54 1754075214

Chevk out any LLM API providers pricing. Output tokens are always significantly more expensive than input (which can also be cached).

danenania · 2025-08-01T19:40:47 1754077247

Input tokens usually dominate output tokens by a lot more than 2x though. It’s often 10x or more input. It can even easily be 100x or more. Again in realistic workflows.

Caching does help the situation, but you always at least pay the initial cache write. And prompts need to be structured carefully to be cacheable. It’s not a free lunch.

redox99 · 2025-08-01T17:27:05 1754069225

That's usually not the case for thinking models. And usually hard problems have a very short prompt.

danenania · 2025-08-01T18:01:15 1754071275

For me personally (using mostly for coding and project planning) it's nearly always the case, including with thinking models. I'm usually pasting in a bunch of files, screenshots, etc., and having long conversations. Input nearly always heavily dominates output.

I don't disagree that there are hard problems which use short prompts, like math homework problems etc., but they mostly aren't what I would categorize as "real work". But of course I can only speak to my own experience /shrug.

redox99 · 2025-08-01T20:04:24 1754078664

Yeah coding is definitely a situation where context is usually very very large. But at the same time in those situations something like Sonnet is fine.