Qwen3 32B Instruct

32B

Alibaba Qwen3

Qwen3 dense 32B — successor to Qwen2.5-32B with stronger reasoning and thinking mode.

⬇ 10.1K HF downloads♥ 69 likesQwen/Qwen3-32B-GGUF· stats from 6/25/2026

Consumer GPUPro GPU

41K

Max Context

Quant Variants

GGUF Q4_K_M

Best Quality

97.5%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

Format	Level	BPW	VRAM	PPL Loss	Speed	Actions
GGUF	Q3_K_M	3.87	17.8 GB	7.2%	50 tok/s	Calc HF
GGUF	Q4_K_M	4.85	22.5 GB	2.5%	42 tok/s	Calc HF
EXL2	3.5bpw	3.5	16.8 GB	4.5%	62 tok/s	Calc HF
AWQ	INT4	4	19.5 GB	3.6%	55 tok/s	Calc HF

Alibaba Qwen3

Qwen3 MoE with only 3B active params. Q4 ~19GB file; outperforms QwQ-32B on 16GB cards.

Alibaba Qwen3

Latest Qwen3 dense 8B with thinking mode. Strong upgrade from Qwen2.5 7B for local deploy.

Alibaba Qwen3

Qwen3 14B — best balance of reasoning and VRAM in the 2026 Qwen lineup.

Alibaba Qwen3

Qwen3 flagship MoE (22B active / 235B total). Q4_K_M ~142GB; rivals DeepSeek-R1 class models.