Gemma 3 12B IT

12B

Google Gemma 3

Mid-size Gemma 3 with vision. Fits 16GB at Q4; excellent multilingual chat.

Consumer GPUMac / Apple Silicon

131K

Max Context

3

Quant Variants

GGUF Q5_K_M

Best Quality

98.7%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ4_K_M4.858.8 GB2.9%105 tok/s
CalcHF
GGUFQ5_K_M5.6810.2 GB1.3%92 tok/s
CalcHF
AWQINT448.0 GB3.9%128 tok/s
CalcHF

Similar models

Compare with Gemma 3