Llama 4 Scout 17B (16E)

109B MoE

Meta Llama 4

Meta Llama 4 Scout MoE (17B active / 109B total). Multimodal; needs ~68GB VRAM at Q4_K_M.

29.3K HF downloads155 likesunsloth/Llama-4-Scout-17B-16E-Instruct-GGUF· stats from 6/25/2026
Pro GPU

10486K

Max Context

3

Quant Variants

GGUF Q4_K_M

Best Quality

97.6%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ4_K_M4.8568.0 GB2.4%22 tok/s
CalcHF
GGUFQ3_K_M3.8755.0 GB4.8%26 tok/s
CalcHF
AWQINT4458.0 GB3.2%28 tok/s
CalcHF

Similar models

Compare with Llama 4