GLM-4-9B-Chat

Zhipu GLM-4

Zhipu GLM-4 open 9B with 128K context, tool calling, and strong bilingual (EN/ZH) performance.

⬇ 28.4K HF downloads♥ 25 likeszai-org/glm-4-9b-chat-hf· stats from 6/25/2026

Consumer GPUMac / Apple Silicon

131K

Max Context

Quant Variants

GGUF Q8_0

Best Quality

99.6%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

Format	Level	BPW	VRAM	PPL Loss	Speed	Actions
GGUF	Q4_K_M	4.85	6.2 GB	3.0%	135 tok/s	Calc HF
GGUF	Q8_0	8.5	9.8 GB	0.4%	105 tok/s	Calc HF
AWQ	INT4	4	5.5 GB	4.2%	178 tok/s	Calc HF

Alibaba Qwen2.5

The sweet spot between performance and resource usage. 16GB VRAM with Q4.

Microsoft Phi

Microsoft's mid-size Phi-3. Excellent quality-per-GB on 16GB cards.

DeepSeek

MoE general model (~2.4B active). Long context and strong multilingual chat.

TII UAE

Technology Innovation Institute's latest Falcon. Good multilingual and code mix.