Mistral Large 3 675B Instruct
675B MoEMistral AI
Mistral 3 flagship MoE (41B active / 675B total) with vision encoder. FP8 on 8×H200; GGUF quant for research clusters only.
Pro GPU
262K
Max Context
2
Quant Variants
GGUF Q4_K_M
Best Quality
97.9%
Accuracy Retained
Quantization Variants
Per-quant VRAM, quality loss, and inference speed on RTX 4090
Similar models
Compare with Mixtral 8x7B47B MoE
Mixtral 8x7B Instruct
Mistral AI
Pro GPU
25.2 GBmin VRAM·97.2%accuracy
Classic MoE model. ~13B active params per token; needs 32GB+ VRAM for Q4.
24B
Mistral Small 24B Instruct
Mistral AI
Consumer GPUPro GPU
13.5 GBmin VRAM·97.8%accuracy
Mistral's efficient 24B. Strong multilingual; fits on 24GB with Q4.
22B
Codestral 22B
Mistral AI
Consumer GPUPro GPU
13.2 GBmin VRAM·97.0%accuracy
Mistral's dedicated code model. 80+ language support, Fill-in-the-Middle capable.
12B
Mistral Nemo 12B Instruct
Mistral AI
Consumer GPUMac / Apple Silicon
7.8 GBmin VRAM·99.1%accuracy
Mistral + NVIDIA collaboration. 128K context, excellent multilingual support.