PrivacyOllamaSelf-Hosted
Merlin AI Code Review

100% Private Code Review: Merlin AI Code Review + Ollama with Zero Data Egress

February 26, 2025·8 min read·Merlin AI Code Review Team

For teams with the strictest privacy requirements — air-gapped networks, classified systems, or regulatory prohibitions on external AI calls — Merlin AI Code Review can run in a fully local mode using Ollama for inference. Zero data leaves your network. Not the diff, not the review, not a single byte.

How the Merlin AI Code Review + Ollama stack works

Ollama is an open-source tool that runs large language models locally on your hardware. When configured as Merlin AI Code Review's AI backend, the entire review pipeline — from diff ingestion to comment generation — stays within your infrastructure:

  1. Merlin AI Code Review reads the PR diff from your VCS API (GitHub/GitLab — already on your network for self-hosted)
  2. Merlin AI Code Review constructs the review prompt locally
  3. The prompt is sent to Ollama running on a local server
  4. Ollama runs inference on your hardware and returns the review
  5. Merlin AI Code Review posts the review back to the PR via VCS API

No external network calls. No API keys. No data egress.

Hardware requirements

The model you choose determines your hardware needs:

ModelVRAM / RAMQualitySpeed
qwen2.5-coder:32b24GB+ VRAMExcellentModerate
qwen2.5-coder:14b16GB VRAMVery goodFast
deepseek-coder-v2:16b16GB VRAMVery goodFast
codellama:13b16GB RAM (CPU)GoodSlow on CPU

For GPU inference, NVIDIA cards with 16–24GB VRAM provide the best experience. CPU-only inference works but is significantly slower — acceptable for batch review pipelines, less so for real-time PR feedback.

Step 1: Install and configure Ollama

shell
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a code-focused model
ollama pull qwen2.5-coder:14b
# Start Ollama server (runs on :11434 by default)
ollama serve

Step 2: Configure Merlin AI Code Review to use Ollama

merlin.toml
toml
[ai]
provider = "ollama"
model = "qwen2.5-coder:14b"
ollama_base_url = "http://localhost:11434" # or your server IP
max_tokens = 4096
temperature = 0.2

Step 3: Integrate with CI

For self-hosted runners (GitHub Actions self-hosted, GitLab private runners), the runner can reach your Ollama server at its local network address:

yaml
- run: ./merlin review
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# No ANTHROPIC_API_KEY needed — Ollama is used instead

Also: local RAG embeddings

For RAG indexing, Merlin AI Code Review also uses Ollama — the nomic-embed-text embedding model runs locally:

merlin.toml
toml
[rag]
enabled = true
store = "local"
embed_model = "nomic-embed-text"
ollama_base_url = "http://localhost:11434"
shell
ollama pull nomic-embed-text
merlin rag index .

With both Ollama for inference and Ollama for embeddings, the entire Merlin AI Code Review stack runs completely on-premises.