Back to Cookbook
AdvancedEdge / Local 12 min read
Quantize Your Own Model to GGUF
Use llama.cpp's quantize tool to convert any HF model to GGUF Q4_K_M for local inference.
GGUFllama.cppquantizecustom
Convert HF to GGUF
First convert the safetensors model to FP16 GGUF, then quantize.
bash
python convert_hf_to_gguf.py ./my-model --outfile my-model-f16.gguf
./build/bin/llama-quantize my-model-f16.gguf my-model-Q4_K_M.gguf Q4_K_MDeployment guides are educational. Each model is subject to its own license — read the official Hugging Face model card before downloading or deploying.