BeginnerMac / Apple 6 min read

Mac M3 Pro: Realistic Model Limits

What actually fits in 18GB or 36GB unified memory with Ollama and llama.cpp.

MacM3 ProOllamaUnified Memory

18GB M3 Pro

Stick to 7–8B models at Q4. Avoid 14B+ unless you accept very short context.

text

✓ Llama 3.1 8B Q4_K_M (ctx 8K)
✓ Qwen2.5 7B Q4_K_M
✗ Qwen2.5 14B Q4_K_M (needs 36GB+)

36GB M3 Pro

14B models at Q4_K_M with 8K context work well. 32B requires Q3 or heavy context sacrifice.

bash

ollama pull qwen2.5:14b
ollama run qwen2.5:14b

Deployment guides are educational. Each model is subject to its own license — read the official Hugging Face model card before downloading or deploying.