Llama 4 Maverick 17B (128E)
400B MoEMeta Llama 4
Llama 4 Maverick flagship MoE (17B active / 400B total). Multi-GPU or H100 cluster territory.
Pro GPU
1049K
Max Context
2
Quant Variants
GGUF Q4_K_M
Best Quality
97.8%
Accuracy Retained
Quantization Variants
Per-quant VRAM, quality loss, and inference speed on RTX 4090
Similar models
Compare with Llama 4109B MoE
Llama 4 Scout 17B (16E)
Meta Llama 4
Pro GPU
55.0 GBmin VRAM·97.6%accuracy
Meta Llama 4 Scout MoE (17B active / 109B total). Multimodal; needs ~68GB VRAM at Q4_K_M.
70B
Llama 3.1 70B Instruct
Meta Llama 3.1
Pro GPUMac / Apple Silicon
33.4 GBmin VRAM·98.8%accuracy
Meta's frontier 70B model. Requires 40GB+ VRAM; dual 3090 or M2 Ultra.
72B
Qwen2.5 72B Instruct
Alibaba Qwen2.5
Pro GPU
33.8 GBmin VRAM·98.9%accuracy
Flagship Qwen2.5. Requires dual 4090 or A100 80G. Exceptional reasoning at scale.
70B
Llama 3.3 70B Instruct
Meta Llama 3.3
Pro GPUMac / Apple Silicon
38.2 GBmin VRAM·99.0%accuracy
Latest Meta 70B with improved multilingual. Drop-in upgrade from Llama 3.1 70B.