Back to Quant Hub

DeepSeek-R1-Distill-Qwen-32B

32B

DeepSeek

R1 distilled to 32B. Near-frontier reasoning on a single 24GB card (Q3/Q4).

18.2K HF downloads306 likesbartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF· stats from 6/24/2026
Consumer GPUPro GPU

131K

Max Context

3

Quant Variants

GGUF Q4_K_M

Best Quality

97.4%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ3_K_M3.8717.8 GB7.2%50 tok/s
GGUFQ4_K_M4.8522.2 GB2.6%42 tok/s
EXL23.5bpw3.516.8 GB4.5%65 tok/s