Gemma 3 12B IT
12BGoogle Gemma 3
Mid-size Gemma 3 with vision. Fits 16GB at Q4; excellent multilingual chat.
Consumer GPUMac / Apple Silicon
131K
Max Context
3
Quant Variants
GGUF Q5_K_M
Best Quality
98.7%
Accuracy Retained
Quantization Variants
Per-quant VRAM, quality loss, and inference speed on RTX 4090
Similar models
Compare with Gemma 34B
Gemma 3 4B IT
Google Gemma 3
Consumer GPUMac / Apple Silicon
3.0 GBmin VRAM·99.8%accuracy
Google Gemma 3 multimodal 4B. 128K context; strong vision + text on 8GB cards.
14B
Qwen2.5 14B Instruct
Alibaba Qwen2.5
Consumer GPUMac / Apple Silicon
9.2 GBmin VRAM·98.6%accuracy
The sweet spot between performance and resource usage. 16GB VRAM with Q4.
12B
Mistral Nemo 12B Instruct
Mistral AI
Consumer GPUMac / Apple Silicon
7.8 GBmin VRAM·99.1%accuracy
Mistral + NVIDIA collaboration. 128K context, excellent multilingual support.
9B
Gemma 2 9B Instruct
Google Gemma 2
Consumer GPUMac / Apple Silicon
5.8 GBmin VRAM·99.8%accuracy
Google's compact Gemma 2 with sliding window attention. Punches above 9B.