Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Two questions:

1) Anyone have any idea of VRAM requirements?

2) When will this be available on ollama?



1) Rule of thumb is # of params = GB at Q8. So a 12B model generally takes up 12GB of VRAM at 8 bit precision.

But 4bit precision is still pretty good, so 6GB VRAM is viable, not counting additional space for context. Usually about an extra 20% is needed, but 128K is a pretty huge context so more will be needed if you need the whole space.


The model has 12 billion parameters and uses FP8, so 1 byte each. With some working memory I'd bet you can run it on 24GB.

> Designed to fit on the memory of a single NVIDIA L40S, NVIDIA GeForce RTX 4090 or NVIDIA RTX 4500 GPU




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: