Qwen3 4B Instruct

Alibaba Qwen3

Smallest Qwen3 dense with thinking mode. Q4 ~3.2GB — ideal for 8GB GPUs and edge devices.

⬇ 361.8K HF downloads♥ 109 likesQwen/Qwen3-4B-GGUF· stats from 6/25/2026

Consumer GPUMac / Apple SiliconCPU / VPS

33K

Max Context

Quant Variants

GGUF Q6_K

Best Quality

99.3%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

Format	Level	BPW	VRAM	PPL Loss	Speed	Actions
GGUF	Q4_K_M	4.85	3.2 GB	2.9%	168 tok/s	Calc HF
GGUF	Q6_K	6.56	4.1 GB	0.7%	145 tok/s	Calc HF
AWQ	INT4	4	2.8 GB	3.8%	225 tok/s	Calc HF
EXL2	4.65bpw	4.65	3.0 GB	2.1%	248 tok/s	Calc HF

Alibaba Qwen3

Latest Qwen3 dense 8B with thinking mode. Strong upgrade from Qwen2.5 7B for local deploy.

Alibaba Qwen3

Qwen3 14B — best balance of reasoning and VRAM in the 2026 Qwen lineup.

Alibaba Qwen3

Qwen3 MoE with only 3B active params. Q4 ~19GB file; outperforms QwQ-32B on 16GB cards.

Alibaba Qwen3

Qwen3 dense 32B — successor to Qwen2.5-32B with stronger reasoning and thinking mode.