Llama 4 Maverick 17B (128E)

400B MoE

Meta Llama 4

Llama 4 Maverick flagship MoE (17B active / 400B total). Multi-GPU or H100 cluster territory.

10.7K HF downloads49 likesunsloth/Llama-4-Maverick-17B-128E-Instruct-GGUF· stats from 6/25/2026
Pro GPU

1049K

Max Context

2

Quant Variants

GGUF Q4_K_M

Best Quality

97.8%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ4_K_M4.85245.0 GB2.2%8 tok/s
CalcHF
GGUFQ3_K_M3.87198.0 GB4.5%10 tok/s
CalcHF

Similar models

Compare with Llama 4