DeepSeek-R1

671B MoE

DeepSeek

DeepSeek-R1 reasoning model built on V3 MoE. Chain-of-thought at frontier scale — use distill variants for local GPUs.

⬇ 7.2M HF downloads♥ 13414 likesdeepseek-ai/DeepSeek-R1· stats from 6/25/2026

Pro GPU

164K

Max Context

Quant Variants

GGUF Q4_K_M

Best Quality

98.2%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

Format	Level	BPW	VRAM	PPL Loss	Speed	Actions
GGUF	Q4_K_M	4.85	385.0 GB	1.8%	4 tok/s	Calc HF
GGUF	Q3_K_M	3.87	310.0 GB	4.2%	5 tok/s	Calc HF

DeepSeek

DeepSeek-V3 frontier MoE (~37B active / 671B total). MLA + FP8; multi-node GPU cluster required at Q4.

DeepSeek

R1 reasoning in Llama 70B architecture. Top open reasoning model for dual-GPU setups.

DeepSeek

R1 distilled to 32B. Near-frontier reasoning on a single 24GB card (Q3/Q4).

DeepSeek

R1 reasoning distilled into 14B. Huge community interest; excellent chain-of-thought.