Back to Quant Hub

Llama 3.2 1B Instruct

1B

Meta Llama 3.2

Ultra-light Llama for mobile and embedded. Sub-2GB VRAM with Q4.

⬇ 308.9K HF downloads♥ 166 likesbartowski/Llama-3.2-1B-Instruct-GGUF· stats from 6/24/2026

Consumer GPUMac / Apple SiliconCPU / VPS

131K

Max Context

2

Quant Variants

GGUF Q8_0

Best Quality

99.5%

Accuracy Retained

Calculate VRAM Hugging Face Compare

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

Format	Level	BPW	VRAM	PPL Loss	Speed	Actions
GGUF	Q4_K_M	4.85	1.0 GB	4.2%	520 tok/s	Calc HF
GGUF	Q8_0	8.5	1.5 GB	0.5%	450 tok/s	Calc HF