Gemma 3 4B IT

4B

Google Gemma 3

Google Gemma 3 multimodal 4B. 128K context; strong vision + text on 8GB cards.

Consumer GPUMac / Apple SiliconCPU / VPS

131K

Max Context

3

Quant Variants

GGUF Q8_0

Best Quality

99.8%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ4_K_M4.853.4 GB3.2%175 tok/s
CalcHF
GGUFQ8_08.55.2 GB0.2%145 tok/s
CalcHF
AWQINT443.0 GB4.2%210 tok/s
CalcHF

Similar models

Compare with Gemma 3