Llama 3.1 405B Instruct

405B

Meta Llama 3.1

Meta frontier dense 405B. Q4 needs ~230GB+ VRAM; dual H100 80G or 8× consumer GPU.

Pro GPU

131K

Max Context

3

Quant Variants

GGUF Q4_K_M

Best Quality

97.7%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ4_K_M4.85238.0 GB2.3%6 tok/s
CalcHF
GGUFQ3_K_M3.87192.0 GB5.0%8 tok/s
CalcHF
AWQINT44210.0 GB3.5%10 tok/s
CalcHF