Run Anthropic's Vuln-Discovery Harness on Your Code

GitHub repository page for anthropics/defending-code-reference-harness showing 2,200+ stars — open-source AI vulnerability discovery pipeline

Anthropic just did something unusual: they gave away the engineering pattern behind their AI vulnerability discovery system. The defending-code-reference-harness — which hit #1 on Hacker News with 504 points and 140+ comments — is a complete, open-source pipeline for running Claude as an autonomous security agent. Threat modeling, scanning, triage, patching. The whole loop.

The repo has accumulated 2,200+ GitHub stars since release. And unlike most “reference implementations” that feel like marketing, this one ships real infrastructure: gVisor sandboxing, multi-agent verification, and six Claude Code skills you can run today.

But there’s a catch — several, actually. The repo explicitly says “not maintained and not accepting contributions.” Running it at scale costs real money. And Anthropic also sells Claude Security, a hosted product that does the same thing with fewer ops headaches. This is an operator guide: what the harness actually does, how to set it up, what it costs, and when you should use it instead of the hosted alternative.

What the Harness Actually Is

The defending-code-reference-harness ships two things: a set of Claude Code skills for interactive security work, and an autonomous scanning pipeline that chains them together.

The skills are the building blocks:

/quickstart — Interactive scoping. Asks what you want to scan, sets up the context.
/threat-model — Generates a threat model for your codebase. Decides what counts as a vulnerability before scanning begins — a step most security tools skip entirely.
/vuln-scan — Reads source code and looks for vulnerabilities with reasoning, not regex. Files, data flows, cross-module dependencies.
/triage — Filters findings by severity and exploitability. Reduces noise.
/patch — Generates targeted patches for confirmed vulnerabilities with human review.
/customize — Lets you edit the harness configuration: what languages to target, what vulnerability classes matter, how agents are orchestrated.

The skills live in .claude/skills/ and work like any Claude Code skill — you can run them interactively in your terminal. /quickstart, /threat-model, /vuln-scan, /triage, and /patch only read and write files; they’re safe to run unsandboxed as long as you review each tool use. /customize edits harness code and runs validation commands, so treat it with the same caution you’d give any code-executing agent.

The autonomous pipeline chains these into a six-step loop: recon → find → verify → report → patch. This is where the real architecture lives.

The Pipeline: Recon → Find → Verify → Report → Patch

The official documentation describes a six-step process that mirrors how a human security researcher would work:

Step 1 — Threat Model. Before scanning, the agent builds a threat model: what attack surfaces exist, what vulnerability classes matter for this codebase, what’s in scope. This step is critical. Most automated scanners skip it and drown you in irrelevant findings.

Step 2 — Build a Sandbox. The harness runs each agent inside a gVisor container with egress restricted to the Claude API. No internet access. No filesystem escape. This isolation isn’t optional — the autonomous pipeline refuses to run outside a gVisor sandbox unless you explicitly override it. The reason: agents execute target code to verify exploits. You want that happening in a cage.

Step 3 — Scan. The agent reads source code file-by-file, looking for vulnerabilities. This isn’t pattern matching — it’s reasoning about how code actually behaves, including data flows across files and modules. The default configuration targets C/C++ memory vulnerabilities using Docker and ASAN (AddressSanitizer).

Step 4 — Verify. Found something? The pipeline generates a proof-of-concept exploit and runs it in the sandbox. If ASAN or the crash oracle confirms the bug, it’s real. If not, it’s dropped. This verification step is what separates the harness from traditional SAST tools that flag everything and hope for the best.

Step 5 — Report. Confirmed findings get structured reports: vulnerability class, affected code, severity assessment, and the proof-of-concept that confirmed it.

Step 6 — Patch. The agent generates a targeted patch and verifies that it fixes the vulnerability without breaking existing tests. Human review is expected here — the harness generates the patch, you approve it.

The multi-agent verification pipeline — scan, then independently verify each finding — is the same architectural pattern that powers Claude Mythos, Anthropic’s internal system that identified 23,019 issues across 1,000+ open-source projects, with 6,202 rated high or critical severity.

Setting It Up on Your Own Repo

Prerequisites: Docker, Python 3.10+, a Claude API key (or Claude Code subscription with enough usage).

One-time setup:

git clone https://github.com/anthropics/defending-code-reference-harness.git
cd defending-code-reference-harness

# Create Python virtual environment
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Install gVisor, build agent images, verify isolation
./scripts/setup_sandbox.sh

The setup_sandbox.sh script installs gVisor (if not already present), builds the Docker images for agent isolation, and runs verification tests to ensure the sandbox is properly configured. This step requires Docker to be running.

Running the interactive skills:

The simplest way to start is with the interactive skills, which don’t require the full sandbox:

# In your target repo directory
claude code

# Then use the skills interactively:
/quickstart    # Scoping
/threat-model  # Threat modeling
/vuln-scan     # Vulnerability scanning
/triage        # Filter and prioritize
/patch         # Generate patches

These skills work on any codebase, not just C/C++. They’re reading source code and reasoning about it — the language-specific parts (ASAN, crash oracles) only matter for the autonomous pipeline.

Running the autonomous pipeline:

# Point the harness at your target repo
./harness/run.py --target /path/to/your/repo --config harness/config.yaml

The autonomous pipeline is configured for C/C++ memory vulnerabilities by default. To target other languages or vulnerability classes, use the /customize skill to modify the configuration.

Warning: The autonomous pipeline executes target code inside the sandbox to verify exploits. Make sure your target repo builds cleanly in the Docker environment. If the build fails, the verification step can’t confirm findings, and you’ll get unverified reports instead of confirmed vulnerabilities.

The Cost Reality

Here’s the number HN commenters fixated on: the harness consumes roughly 10,000 uncached input tokens per minute and 2,000 output tokens per minute per running agent.

For a single scanning session on a medium-sized codebase, that might mean 500K–2M tokens consumed over a few hours. Manageable for a one-off audit. But if you’re running continuous scanning across multiple repositories — the way you’d use Claude Security or a traditional SAST tool — the numbers compound fast.

The official documentation estimates that a company with 100 developers running continuous scanning could spend approximately $2.5 million per year on token costs alone.

That’s the same cost anxiety driving the broader AI economics reckoning. Token prices have fallen 280x in two years, but per-workflow consumption has exploded — especially for agentic workloads that re-send the entire context with every turn. (For the macro picture, see our ComputeLeap sister piece on why the subsidy clock is ticking.)

Token optimization tip: The harness supports context compression. If you’re running it at scale, pairing it with a token compression layer like headroom or rtk could significantly reduce costs on the re-sent context in multi-turn verification loops.

The cost math changes your decision: for a quarterly security audit, the open-source harness is almost certainly cheaper than Claude Security. For continuous scanning, the hosted product’s pricing may actually be more economical — and comes without the ops burden.

Claude Security (Hosted) vs. the Open-Source Harness

Anthropic now offers two paths to the same capability:

	Open-Source Harness	Claude Security (Hosted)
Cost model	Pay-per-token (API)	Subscription (Enterprise)
Setup	Docker + gVisor + config	Sign up, point at repo
Customization	Full — edit pipeline, add languages	Limited — Anthropic controls config
Maintenance	You (repo is “not maintained”)	Anthropic
Model	Whatever Claude model you configure	Opus 4.7 + Mythos capabilities
Scale	Your infrastructure	Anthropic’s infrastructure

Claude Security launched in public beta as part of Claude Enterprise. It adds scheduled scans, finding dismissal with documented reasons, and CSV/Markdown exports for audit systems. If you’re an enterprise team that wants vulnerability scanning as a service, this is the path of least resistance.

The open-source harness is for operators who want to understand and control the pipeline. If you’re building your own security tooling, integrating vulnerability discovery into a custom CI/CD pipeline, or need to target vulnerability classes that Claude Security doesn’t cover yet, the harness gives you the engineering pattern to build on.

As one Substack commenter noted: the skills are “much closer to ‘good’ because [Anthropic has] a lot actually riding on it.” This isn’t a weekend side project — it’s the reference implementation behind a product line.

HN’s Take: What the Community Is Actually Saying

The Hacker News thread (504 points, 140+ comments) surfaced several practical concerns worth tracking:

On cost: Multiple commenters flagged the ~10K tokens/min consumption as a potential dealbreaker for continuous use. The consensus: the harness is best used as an audit tool, not a continuous scanner — unless you have the budget for it.

On the “not maintained” label: Skepticism about long-term viability. Some commenters read it as Anthropic wanting you to use Claude Security instead. Others saw it as standard open-source disclaimer language. The truth is probably both.

On DIY harnesses: Several experienced security engineers noted that building a similar pipeline with Claude’s API is “surprisingly accessible today.” The harness’s value isn’t that it does something impossible — it’s that it codifies best practices (especially the gVisor sandboxing and multi-agent verification) that you’d otherwise have to figure out yourself.

On SAST comparison: The thread drew sharp comparisons to Snyk, Semgrep, and SonarQube. The consensus: traditional SAST catches known patterns; the AI harness finds logic vulnerabilities and novel bug classes that pattern-matching tools miss. The tradeoff is cost, speed, and false positive rates.

Where It Fits in the Agent Security Stack

The defending-code-reference-harness occupies a specific niche: it’s a reference implementation for operators who want to build or customize AI-powered vulnerability discovery.

It sits alongside several other approaches:

Claude Security (hosted) — For enterprises that want vuln scanning as a service. Higher cost, lower ops burden.
Traditional SAST (Snyk, Semgrep, SonarQube) — Fast, cheap, deterministic. Catches known patterns but misses novel logic bugs.
Microsoft MDASH — Multi-model agentic vulnerability discovery with 100+ specialized agents. Enterprise-scale but not open-source.
OpenAnt — Community-built open-source alternative that emerged when the community decided no one else wanted to compete with Anthropic.
Custom harnesses — Roll your own with Claude’s API. The defending-code-reference-harness is essentially a well-documented version of what many security teams are already building.

A recent security audit of 22,511 AI coding agent skills found 140,963 issues — a reminder that the tools we use to write code need security attention too. The harness’s skills are no exception: review them before trusting them with your codebase.

Operator recommendation: Start with the interactive skills (/threat-model, /vuln-scan, /triage) on a single repo to understand what the harness finds and how it reasons. Run the autonomous pipeline on a test project with known vulnerabilities to calibrate your expectations. Only then scale to production repos — and budget for the token costs before committing.

What This Means for Operators

The defending-code-reference-harness is a signal. Anthropic is saying: here’s how we build AI-powered security tooling, and here’s the pattern for you to build your own.

The practical takeaway is straightforward:

If you need a quarterly security audit, clone the harness, run the interactive skills, review the findings. Cost: a few hundred dollars in tokens.
If you need continuous scanning, evaluate Claude Security (hosted) against the token cost of running the harness yourself. The hosted product is almost certainly cheaper at scale.
If you’re building custom security tooling, the harness is a production-grade reference implementation. The gVisor sandboxing pattern and multi-agent verification loop are worth studying even if you don’t use the rest.

The harness won’t replace your SAST pipeline. It will find the bugs your SAST pipeline misses — the logic vulnerabilities, the novel bug classes, the cross-module data flow issues that pattern matching can’t catch. At ~10K tokens per minute per agent, that capability has a price. Whether it’s worth it depends on what you’re defending.