Gemma 3 4B IT
4BGoogle Gemma 3
Google Gemma 3 multimodal 4B. 128K context; strong vision + text on 8GB cards.
Consumer GPUMac / Apple SiliconCPU / VPS
131K
Max Context
3
Quant Variants
GGUF Q8_0
Best Quality
99.8%
Accuracy Retained
Quantization Variants
Per-quant VRAM, quality loss, and inference speed on RTX 4090
Similar models
Compare with Gemma 312B
Gemma 3 12B IT
Google Gemma 3
Consumer GPUMac / Apple Silicon
8.0 GBmin VRAM·98.7%accuracy
Mid-size Gemma 3 with vision. Fits 16GB at Q4; excellent multilingual chat.
8B
Llama 3.1 8B Instruct
Meta Llama 3.1
Consumer GPUMac / Apple Silicon
3.2 GBmin VRAM·99.9%accuracy
Meta's flagship 8B model with 128K context. Best-in-class for local deployment.
7B
Qwen2.5 7B Instruct
Alibaba Qwen2.5
Consumer GPUMac / Apple Silicon
4.8 GBmin VRAM·99.3%accuracy
Alibaba's highly optimized 7B. Punches well above its weight, especially in coding.
3.8B
Phi-3.5 Mini Instruct
Microsoft Phi
Consumer GPUMac / Apple Silicon
2.5 GBmin VRAM·99.8%accuracy
Microsoft's tiny powerhouse. Best 4B model for on-device deployment.