Back to Quant Hub

DBRX Instruct

132B

Databricks

MoE flagship (~36B active). Needs multi-GPU; strong code and reasoning at scale.

Pro GPU

33K

Max Context

2

Quant Variants

GGUF Q4_K_M

Best Quality

97.5%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ4_K_M4.8578.5 GB2.5%15 tok/s
GGUFQ3_K_M3.8763.2 GB5.8%18 tok/s