Back to Quant Hub

DBRX Instruct

132B

Databricks

MoE flagship (~36B active). Needs multi-GPU; strong code and reasoning at scale.

Pro GPU

33K

Max Context

2

Quant Variants

GGUF Q4_K_M

Best Quality

97.5%

Accuracy Retained

Calculate VRAM Hugging Face Compare

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

Format	Level	BPW	VRAM	PPL Loss	Speed	Actions
GGUF	Q4_K_M	4.85	78.5 GB	2.5%	15 tok/s	Calc HF
GGUF	Q3_K_M	3.87	63.2 GB	5.8%	18 tok/s	Calc HF