Mistral Large 3 675B Instruct

675B MoE

Mistral AI

Mistral 3 flagship MoE (41B active / 675B total) with vision encoder. FP8 on 8×H200; GGUF quant for research clusters only.

⬇ 2.9K HF downloads♥ 233 likesmistralai/Mistral-Large-3-675B-Instruct-2512· stats from 6/25/2026

Pro GPU

262K

Max Context

Quant Variants

GGUF Q4_K_M

Best Quality

97.9%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

Format	Level	BPW	VRAM	PPL Loss	Speed	Actions
GGUF	Q4_K_M	4.85	388.0 GB	2.1%	4 tok/s	Calc HF
GGUF	Q3_K_M	3.87	312.0 GB	4.5%	5 tok/s	Calc HF

Mistral AI

Classic MoE model. ~13B active params per token; needs 32GB+ VRAM for Q4.

Mistral AI

Mistral's efficient 24B. Strong multilingual; fits on 24GB with Q4.

Mistral AI

Mistral's dedicated code model. 80+ language support, Fill-in-the-Middle capable.

Mistral AI

Mistral + NVIDIA collaboration. 128K context, excellent multilingual support.