100% Private Code Review: Merlin AI Code Review + Ollama with Zero Data Egress

February 21, 2026·8 min read·Arunachalam Kalimuthu

For teams with the strictest privacy requirements — air-gapped networks, classified systems, or regulatory prohibitions on external AI calls — Merlin AI Code Review can run in a fully local mode using Ollama for inference. Zero data leaves your network. Not the diff, not the review, not a single byte.

How the Merlin AI Code Review + Ollama stack works

Ollama is an open-source tool that runs large language models locally on your hardware. When configured as Merlin AI Code Review's AI backend, the entire review pipeline — from diff ingestion to comment generation — stays within your infrastructure:

Merlin AI Code Review reads the PR diff from your VCS API (GitHub/GitLab — already on your network for self-hosted)
Merlin AI Code Review constructs the review prompt locally
The prompt is sent to Ollama running on a local server
Ollama runs inference on your hardware and returns the review
Merlin AI Code Review posts the review back to the PR via VCS API

No external network calls. No API keys. No data egress.

Hardware requirements

The model you choose determines your hardware needs:

Model	VRAM / RAM	Quality	Speed
`qwen2.5-coder:32b`	24GB+ VRAM	Excellent	Moderate
`qwen2.5-coder:14b`	16GB VRAM	Very good	Fast
`deepseek-coder-v2:16b`	16GB VRAM	Very good	Fast
`codellama:13b`	16GB RAM (CPU)	Good	Slow on CPU

For GPU inference, NVIDIA cards with 16–24GB VRAM provide the best experience. CPU-only inference works but is significantly slower — acceptable for batch review pipelines, less so for real-time PR feedback.

Step 1: Install and configure Ollama

shell

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a code-focused model
ollama pull qwen2.5-coder:14b
# Start Ollama server (runs on :11434 by default)
ollama serve

Step 2: Configure Merlin AI Code Review to use Ollama

merlin.toml

toml

[ai]
provider        = "ollama"
model           = "qwen2.5-coder:14b"
ollama_base_url = "http://localhost:11434"   # or your server IP
max_tokens      = 4096
temperature     = 0.2

Step 3: Integrate with CI

For self-hosted runners (GitHub Actions self-hosted, GitLab private runners), the runner can reach your Ollama server at its local network address:

yaml

- run: ./merlin review
  env:
    GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
    # No ANTHROPIC_API_KEY needed — Ollama is used instead

Also: local RAG embeddings

For RAG indexing, Merlin AI Code Review also uses Ollama — the nomic-embed-text embedding model runs locally:

merlin.toml

toml

[rag]
enabled         = true
store           = "local"
embed_model     = "nomic-embed-text"
ollama_base_url = "http://localhost:11434"

shell

ollama pull nomic-embed-text
merlin rag index .

With both Ollama for inference and Ollama for embeddings, the entire Merlin AI Code Review stack runs completely on-premises.

Configuration reference →← Back to Blog

PrivacySelf-Hosted

100% Private Code Review: Merlin AI Code Review + Ollama with Zero Data Egress

How the Merlin AI Code Review + Ollama stack works

Hardware requirements

Step 1: Install and configure Ollama

Step 2: Configure Merlin AI Code Review to use Ollama

Step 3: Integrate with CI

Also: local RAG embeddings

Related articles

Self-Hosted AI vs Cloud: Why Privacy Matters in Code Review

BYOK AI Explained: Why Bringing Your Own Key Matters for Code Review

AI Code Review for GitLab CI: Cloud and Self-Hosted