1) Rule of thumb is # of params = GB at Q8. So a 12B model generally takes up 12GB of VRAM at 8 bit precision.
But 4bit precision is still pretty good, so 6GB VRAM is viable, not counting additional space for context. Usually about an extra 20% is needed, but 128K is a pretty huge context so more will be needed if you need the whole space.
1) Anyone have any idea of VRAM requirements?
2) When will this be available on ollama?