GLM-4-9B-Chat
9BZhipu GLM-4
Zhipu GLM-4 open 9B with 128K context, tool calling, and strong bilingual (EN/ZH) performance.
Consumer GPUMac / Apple Silicon
131K
Max Context
3
Quant Variants
GGUF Q8_0
Best Quality
99.6%
Accuracy Retained
Quantization Variants
Per-quant VRAM, quality loss, and inference speed on RTX 4090
Similar models
Compare with Qwen2.5 14B14B
Qwen2.5 14B Instruct
Alibaba Qwen2.5
Consumer GPUMac / Apple Silicon
9.2 GBmin VRAM·98.6%accuracy
The sweet spot between performance and resource usage. 16GB VRAM with Q4.
14B
Phi-3 Medium 14B Instruct
Microsoft Phi
Consumer GPUMac / Apple Silicon
8.8 GBmin VRAM·99.2%accuracy
Microsoft's mid-size Phi-3. Excellent quality-per-GB on 16GB cards.
16B
DeepSeek-V2-Lite Chat
DeepSeek
Consumer GPUMac / Apple Silicon
9.6 GBmin VRAM·97.0%accuracy
MoE general model (~2.4B active). Long context and strong multilingual chat.
10B
Falcon 3 10B Instruct
TII UAE
Consumer GPUMac / Apple Silicon
6.2 GBmin VRAM·96.9%accuracy
Technology Innovation Institute's latest Falcon. Good multilingual and code mix.