Qwen3 4B Instruct

4B

Alibaba Qwen3

Smallest Qwen3 dense with thinking mode. Q4 ~3.2GB — ideal for 8GB GPUs and edge devices.

361.8K HF downloads109 likesQwen/Qwen3-4B-GGUF· stats from 6/25/2026
Consumer GPUMac / Apple SiliconCPU / VPS

33K

Max Context

4

Quant Variants

GGUF Q6_K

Best Quality

99.3%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ4_K_M4.853.2 GB2.9%168 tok/s
CalcHF
GGUFQ6_K6.564.1 GB0.7%145 tok/s
CalcHF
AWQINT442.8 GB3.8%225 tok/s
CalcHF
EXL24.65bpw4.653.0 GB2.1%248 tok/s
CalcHF

Similar models

Compare with Qwen3 8B