Back to Quant Hub

Llama 3.2 1B Instruct

1B

Meta Llama 3.2

Ultra-light Llama for mobile and embedded. Sub-2GB VRAM with Q4.

308.9K HF downloads166 likesbartowski/Llama-3.2-1B-Instruct-GGUF· stats from 6/24/2026
Consumer GPUMac / Apple SiliconCPU / VPS

131K

Max Context

2

Quant Variants

GGUF Q8_0

Best Quality

99.5%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ4_K_M4.851.0 GB4.2%520 tok/s
GGUFQ8_08.51.5 GB0.5%450 tok/s