Qwen3 8B Instruct

8B

Alibaba Qwen3

Latest Qwen3 dense 8B with thinking mode. Strong upgrade from Qwen2.5 7B for local deploy.

87.2K HF downloads208 likesQwen/Qwen3-8B-GGUF· stats from 6/25/2026
Consumer GPUMac / Apple SiliconCPU / VPS

41K

Max Context

4

Quant Variants

GGUF Q6_K

Best Quality

99.4%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ4_K_M4.855.8 GB2.8%142 tok/s
CalcHF
GGUFQ6_K6.567.5 GB0.6%122 tok/s
CalcHF
AWQINT445.1 GB3.8%205 tok/s
CalcHF
EXL24.65bpw4.655.5 GB2.0%228 tok/s
CalcHF