Qwen3 30B-A3B Instruct
30B-A3BAlibaba Qwen3
Qwen3 MoE with only 3B active params. Q4 ~19GB file; outperforms QwQ-32B on 16GB cards.
Consumer GPUMac / Apple Silicon
41K
Max Context
3
Quant Variants
GGUF Q5_K_M
Best Quality
98.9%
Accuracy Retained
Quantization Variants
Per-quant VRAM, quality loss, and inference speed on RTX 4090
Similar models
Compare with Qwen3 32B32B
Qwen3 32B Instruct
Alibaba Qwen3
Consumer GPUPro GPU
16.8 GBmin VRAM·97.5%accuracy
Qwen3 dense 32B — successor to Qwen2.5-32B with stronger reasoning and thinking mode.
8B
Qwen3 8B Instruct
Alibaba Qwen3
Consumer GPUMac / Apple Silicon
5.1 GBmin VRAM·99.4%accuracy
Latest Qwen3 dense 8B with thinking mode. Strong upgrade from Qwen2.5 7B for local deploy.
14B
Qwen3 14B Instruct
Alibaba Qwen3
Consumer GPUMac / Apple Silicon
9.5 GBmin VRAM·98.8%accuracy
Qwen3 14B — best balance of reasoning and VRAM in the 2026 Qwen lineup.
235B-A22B
Qwen3 235B-A22B Instruct
Alibaba Qwen3
Pro GPU
115.0 GBmin VRAM·97.8%accuracy
Qwen3 flagship MoE (22B active / 235B total). Q4_K_M ~142GB; rivals DeepSeek-R1 class models.