DeepSeek-V3
671B MoEDeepSeek
DeepSeek-V3 frontier MoE (~37B active / 671B total). MLA + FP8; multi-node GPU cluster required at Q4.
Pro GPU
164K
Max Context
2
Quant Variants
GGUF Q4_K_M
Best Quality
98.0%
Accuracy Retained
Quantization Variants
Per-quant VRAM, quality loss, and inference speed on RTX 4090
Similar models
Compare with DeepSeek-R1671B MoE
DeepSeek-R1
DeepSeek
Pro GPU
310.0 GBmin VRAM·98.2%accuracy
DeepSeek-R1 reasoning model built on V3 MoE. Chain-of-thought at frontier scale — use distill variants for local GPUs.
70B
DeepSeek-R1-Distill-Llama-70B
DeepSeek
Pro GPU
38.2 GBmin VRAM·97.6%accuracy
R1 reasoning in Llama 70B architecture. Top open reasoning model for dual-GPU setups.
32B
DeepSeek-R1-Distill-Qwen-32B
DeepSeek
Consumer GPUPro GPU
16.8 GBmin VRAM·97.4%accuracy
R1 distilled to 32B. Near-frontier reasoning on a single 24GB card (Q3/Q4).
14B
DeepSeek-R1-Distill-Qwen-14B
DeepSeek
Consumer GPU
9.2 GBmin VRAM·98.0%accuracy
R1 reasoning distilled into 14B. Huge community interest; excellent chain-of-thought.