Back to Quant Hub

Llama 3.1 8B Instruct

8B

Meta Llama 3.1

Meta's flagship 8B model with 128K context. Best-in-class for local deployment.

270.2K HF downloads361 likesbartowski/Meta-Llama-3.1-8B-Instruct-GGUF· stats from 6/24/2026
Consumer GPUMac / Apple SiliconCPU / VPS

131K

Max Context

6

Quant Variants

GGUF Q8_0

Best Quality

99.9%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ2_K2.633.2 GB48.5%210 tok/s
GGUFQ4_K_M4.855.7 GB3.2%148 tok/s
GGUFQ6_K6.567.4 GB0.8%128 tok/s
GGUFQ8_08.59.1 GB0.1%118 tok/s
AWQINT444.9 GB4.5%218 tok/s
EXL24.65bpw4.655.4 GB2.5%235 tok/s