Back to Cookbook
IntermediateEdge / Local 10 min read
ExLlamaV2 on RTX 4090: Full Setup Guide
Install ExLlamaV2, load an EXL2 quant, and serve an OpenAI-compatible API in under 10 minutes.
ExLlamaV2EXL2RTX 4090API
Install
ExLlamaV2 requires NVIDIA CUDA. Python 3.10+ recommended.
bash
pip install exllamav2
# Or clone for latest:
git clone https://github.com/turboderp/exllamav2
cd exllamav2 && pip install -r requirements.txtRun inference
Point at a downloaded EXL2 model directory. Adjust -c for context length.
bash
python examples/chat.py \
-m ./models/Llama-3.1-8B-exl2-4.65bpw \
-c 4096 -gs 16Deployment guides are educational. Each model is subject to its own license — read the official Hugging Face model card before downloading or deploying.