Back to Quant Hub

Gemma 2 9B Instruct

9B

Google Gemma 2

Google's compact Gemma 2 with sliding window attention. Punches above 9B.

25.7K HF downloads231 likesbartowski/gemma-2-9b-it-GGUF· stats from 6/24/2026
Consumer GPUMac / Apple Silicon

8K

Max Context

3

Quant Variants

GGUF Q8_0

Best Quality

99.8%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ4_K_M4.856.5 GB3.3%132 tok/s
GGUFQ8_08.510.2 GB0.2%108 tok/s
AWQINT445.8 GB4.6%188 tok/s