Back to Cookbook
IntermediateDocker 9 min read

Docker: Ollama with NVIDIA GPU Passthrough

Containerised Ollama with GPU access — isolate models, pin versions, and run alongside other services.

DockerOllamaNVIDIAGPUCompose

docker-compose.yml

Requires NVIDIA Container Toolkit on the host. The deploy.resources block requests one GPU. Persist models in a named volume.

yaml
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    restart: unless-stopped

volumes:
  ollama_data:

Run and pull models

Use docker compose (v2) on Linux. On Windows Docker Desktop, enable WSL2 backend and GPU support in settings first.

bash
docker compose up -d
docker exec -it ollama ollama pull llama3.1:8b
docker exec -it ollama ollama run llama3.1:8b

# API test
curl http://localhost:11434/api/generate -d '{"model":"llama3.1:8b","prompt":"Hello","stream":false}'
Deployment guides are educational. Each model is subject to its own license — read the official Hugging Face model card before downloading or deploying.