100% Private Code Review: Merlin AI Code Review + Ollama with Zero Data Egress
For teams with the strictest privacy requirements — air-gapped networks, classified systems, or regulatory prohibitions on external AI calls — Merlin AI Code Review can run in a fully local mode using Ollama for inference. Zero data leaves your network. Not the diff, not the review, not a single byte.
How the Merlin AI Code Review + Ollama stack works
Ollama is an open-source tool that runs large language models locally on your hardware. When configured as Merlin AI Code Review's AI backend, the entire review pipeline — from diff ingestion to comment generation — stays within your infrastructure:
- Merlin AI Code Review reads the PR diff from your VCS API (GitHub/GitLab — already on your network for self-hosted)
- Merlin AI Code Review constructs the review prompt locally
- The prompt is sent to Ollama running on a local server
- Ollama runs inference on your hardware and returns the review
- Merlin AI Code Review posts the review back to the PR via VCS API
No external network calls. No API keys. No data egress.
Hardware requirements
The model you choose determines your hardware needs:
| Model | VRAM / RAM | Quality | Speed |
|---|---|---|---|
qwen2.5-coder:32b | 24GB+ VRAM | Excellent | Moderate |
qwen2.5-coder:14b | 16GB VRAM | Very good | Fast |
deepseek-coder-v2:16b | 16GB VRAM | Very good | Fast |
codellama:13b | 16GB RAM (CPU) | Good | Slow on CPU |
For GPU inference, NVIDIA cards with 16–24GB VRAM provide the best experience. CPU-only inference works but is significantly slower — acceptable for batch review pipelines, less so for real-time PR feedback.
Step 1: Install and configure Ollama
# Install Ollamacurl -fsSL https://ollama.com/install.sh | sh# Pull a code-focused modelollama pull qwen2.5-coder:14b# Start Ollama server (runs on :11434 by default)ollama serve
Step 2: Configure Merlin AI Code Review to use Ollama
[ai]provider = "ollama"model = "qwen2.5-coder:14b"ollama_base_url = "http://localhost:11434" # or your server IPmax_tokens = 4096temperature = 0.2
Step 3: Integrate with CI
For self-hosted runners (GitHub Actions self-hosted, GitLab private runners), the runner can reach your Ollama server at its local network address:
- run: ./merlin reviewenv:GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}# No ANTHROPIC_API_KEY needed — Ollama is used instead
Also: local RAG embeddings
For RAG indexing, Merlin AI Code Review also uses Ollama — the nomic-embed-text embedding model runs locally:
[rag]enabled = truestore = "local"embed_model = "nomic-embed-text"ollama_base_url = "http://localhost:11434"
ollama pull nomic-embed-textmerlin rag index .
With both Ollama for inference and Ollama for embeddings, the entire Merlin AI Code Review stack runs completely on-premises.