Back to Quant Hub

Llama 3.1 8B Instruct

8B

Meta Llama 3.1

Meta's flagship 8B model with 128K context. Best-in-class for local deployment.

⬇ 270.2K HF downloads♥ 361 likesbartowski/Meta-Llama-3.1-8B-Instruct-GGUF· stats from 6/24/2026

Consumer GPUMac / Apple SiliconCPU / VPS

131K

Max Context

6

Quant Variants

GGUF Q8_0

Best Quality

99.9%

Accuracy Retained

Calculate VRAM Hugging Face Compare

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

Format	Level	BPW	VRAM	PPL Loss	Speed	Actions
GGUF	Q2_K	2.63	3.2 GB	48.5%	210 tok/s	Calc HF
GGUF	Q4_K_M	4.85	5.7 GB	3.2%	148 tok/s	Calc HF
GGUF	Q6_K	6.56	7.4 GB	0.8%	128 tok/s	Calc HF
GGUF	Q8_0	8.5	9.1 GB	0.1%	118 tok/s	Calc HF
AWQ	INT4	4	4.9 GB	4.5%	218 tok/s	Calc HF
EXL2	4.65bpw	4.65	5.4 GB	2.5%	235 tok/s	Calc HF