GLM-4-9B-Chat

9B

Zhipu GLM-4

Zhipu GLM-4 open 9B with 128K context, tool calling, and strong bilingual (EN/ZH) performance.

28.4K HF downloads25 likeszai-org/glm-4-9b-chat-hf· stats from 6/25/2026
Consumer GPUMac / Apple Silicon

131K

Max Context

3

Quant Variants

GGUF Q8_0

Best Quality

99.6%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ4_K_M4.856.2 GB3.0%135 tok/s
CalcHF
GGUFQ8_08.59.8 GB0.4%105 tok/s
CalcHF
AWQINT445.5 GB4.2%178 tok/s
CalcHF