Don't Run That Skill Yet - AgentConn Blog

Don't Run That Skill Yet — security scanning and runtime firewalling for AI agent skills

The agent skills gold rush has a security problem. GitHub Trending is wall-to-wall SKILL.md files. ClawHub crossed 50,000 published skills. Every major harness — Claude Code, Cursor, Gemini CLI, Codex — now has a skill ecosystem that teams are git clone-ing into production daily.

And the attackers noticed. Snyk’s ToxicSkills research found that 13.4% of 3,984 surveyed skills contain critical security issues. 1,467 malicious payloads. 76 confirmed malware samples. The ClawHavoc campaign traced 335+ malicious skills to a single threat actor — “hightower6eu” — distributing an AMOS stealer variant through seemingly helpful skills, earning CVE-2026-25253 in the process.

This isn’t hypothetical risk. It’s the OWASP Agentic Skills Top 10 in production. And until last month, there was no purpose-built tooling to catch it.

That changed the same week. NVIDIA open-sourced SkillSpector, a static security scanner that analyzes agent skills before installation. Deno released Claw Patrol, a runtime firewall that sits between agents and the systems they interact with. Together, they form the first two-layer defense for the agent skills supply chain: scan before you install, firewall when you run.

We’ve covered the config file attack surface (which files execute as code), sandboxing agent-generated code (containment at runtime), and using agents offensively for vulnerability discovery (detection). This piece covers what sits between the first two: vetting the skills themselves before they touch your systems, and firewalling them when they do.

The Threat Landscape by the Numbers

The numbers paint a clear picture of why agent skill security can’t wait:

13.4% failure rate: Snyk’s ToxicSkills study found critical issues in roughly 1 in 7 skills surveyed across major marketplaces. The most common vectors: hidden shell execution, data exfiltration via environment variable access, and prompt injection through embedded instructions.
335+ coordinated malicious skills: The ClawHavoc campaign wasn’t script kiddies — it was a sophisticated single-actor operation distributing credential-stealing malware through skills with legitimate-looking descriptions, real README files, and functional (but backdoored) code.
76 confirmed malware samples: Not “potentially unwanted” or “overly permissive” — actual malware that exfiltrates credentials, installs persistence mechanisms, and phones home to C2 infrastructure.

Zero install-time gates: Before SkillSpector, there was no equivalent of npm audit for agent skills. Teams were vetting SKILL.md files by reading them — a manual process that doesn’t scale and misses obfuscated payloads entirely.

The OWASP Agentic Skills Top 10 codifies these attack patterns: prompt injection via skill instructions, excessive agency grants, tool poisoning, data exfiltration through MCP channels, and supply chain compromise of skill dependencies. It’s the same taxonomy SkillSpector uses for its 16 detection categories.

SkillSpector: Scan Before You Install

NVIDIA/SkillSpector is the first open-source security scanner purpose-built for agent skills. Where traditional SAST tools look at source code for SQL injection and buffer overflows, SkillSpector looks at SKILL.md files, MCP server manifests, and agent configuration artifacts for agent-specific attack surfaces.

What It Catches

SkillSpector covers 64 vulnerability patterns across 16 categories:

Prompt injection: Hidden instructions in skill descriptions that override agent behavior, including Unicode tricks and zero-width character obfuscation
Data exfiltration: Skills that read environment variables, access credentials, or transmit data to external endpoints not declared in their manifest
Privilege escalation: Skills requesting permissions beyond their stated purpose — a “formatting helper” that wants shell access, a “search tool” that requests filesystem write
Supply chain attacks: Vulnerable dependencies (via OSV.dev lookups), known-bad package patterns, and dependency confusion vectors
Excessive agency: Skills that grant themselves broader tool access than their task requires — the principle of least privilege applied to agent capabilities
Tool poisoning: MCP tool descriptions that contain hidden instructions (the “tool poisoning” vector first demonstrated in 2025)
Trigger abuse: Skills that register overly broad activation patterns, hijacking agent behavior for unrelated requests
Dangerous code patterns: AST analysis of embedded scripts for shell injection, file system manipulation, and process spawning

How It Works

SkillSpector accepts Git repositories, URLs, zip files, directories, and single files. By default it runs fast static checks — pattern matching, AST analysis, dependency lookup. For subtler issues that require understanding intent (is this shell command legitimate for the skill’s purpose, or is it hiding something?), an optional LLM semantic analysis mode compares a skill’s declared purpose against its actual behavior.

Output is a 0-100 risk score with severity labels (Critical, High, Medium, Low, Info) and actionable recommendations. A CI/CD integration gate is straightforward: fail the pipeline if score exceeds your threshold.

# Scan a skill from GitHub
skillspector scan https://github.com/example/my-agent-skill

# Scan a local directory
skillspector scan ./skills/data-formatter

# Run with LLM semantic analysis for deeper inspection
skillspector scan --llm-analysis ./skills/suspicious-tool

The Verified Agent Skills Program

SkillSpector is part of NVIDIA’s broader Verified Agent Skills ecosystem — a public catalog of 162 signed skills spanning 16 product families. Each skill undergoes automated scanning with SkillSpector, human review, cryptographic signing, and documentation via machine-readable skill cards. It’s the npm verified publisher model applied to agent skills.

The program matters beyond NVIDIA’s own ecosystem because it establishes the pattern: automated scanning + human review + signing + machine-readable metadata. Whether or not you use NVIDIA’s catalog, those four layers are what a trusted skill supply chain looks like.

Claw Patrol: Firewall at Runtime

Static scanning catches what it can see. But a skill that passes every static check can still behave badly at runtime — calling APIs it shouldn’t, running SQL it didn’t declare, accessing Kubernetes resources outside its namespace. This is where Claw Patrol enters.

Built by the Deno team and released under MIT license, Claw Patrol is an open-source security firewall that sits between AI agents and the production systems they interact with.

It doesn’t scan code — it inspects traffic. Every request an agent makes passes through Claw Patrol, which parses it at the wire level and evaluates it against declarative security rules.

What It Inspects

Claw Patrol parses three protocol families at the wire level:

SQL: Extracts verbs (SELECT, INSERT, UPDATE, DELETE, DROP, ALTER), table names, and WHERE clause presence. A rule like “block DELETE without WHERE on production tables” is a single HCL block.
Kubernetes API: Extracts resource types, verbs, and namespaces. “Allow GET/LIST on pods in staging, require human approval for DELETE in production” — that’s the kind of policy it enforces.
HTTP: Extracts methods, paths, and headers. Rate limiting, endpoint allowlisting, and header validation at the proxy layer.

Rules in HCL

Claw Patrol rules are written in HashiCorp Configuration Language (HCL) — the same language used for Terraform, Nomad, and Vault policies. This is a deliberate choice: platform teams already know HCL, and existing policy-as-code workflows (version control, PR review, CI validation) apply directly.

# Block destructive SQL without WHERE clauses
rule "no-naked-deletes" {
  protocol = "sql"
  match {
    verb = "DELETE"
    has_where = false
  }
  action = "deny"
  message = "DELETE without WHERE clause is not allowed"
}

# Require human approval for production pod deletion
rule "prod-pod-delete-approval" {
  protocol = "kubernetes"
  match {
    resource = "pods"
    verb     = "delete"
    namespace = "production"
  }
  action = "approve"
  approver = "human"
}

Three Deployment Modes

Claw Patrol supports three deployment modes for different operational models:

clawpatrol run — Wraps a single agent’s process tree. The simplest mode: prefix your agent launch command with clawpatrol run and all outbound traffic from that process gets inspected. Good for development and single-agent deployments.
clawpatrol join — Brings up a WireGuard tunnel routing the whole host through the firewall. Every agent on the machine gets inspected without modifying their launch commands. Good for multi-agent hosts.
clawpatrol gateway — A standalone proxy that loads HCL config and accepts tunneled clients via WireGuard or Tailscale. The production deployment mode: run the gateway on dedicated infrastructure, route agent traffic through it.

Credential Injection

A key security feature: Claw Patrol supports credential injection, meaning agents never see raw secrets. The agent requests “access to the production database” — Claw Patrol injects the actual credentials at the proxy layer, evaluates the query against its rules, and strips the credentials from any logs or error messages. The agent operates with the capability without holding the secret.

Static + Runtime: Why You Need Both

SkillSpector and Claw Patrol aren’t competing tools — they’re complementary layers in a defense-in-depth model. Here’s why you need both:

SkillSpector catches what’s visible before installation:

Known-bad patterns in skill code and configuration
Vulnerable dependencies via OSV.dev
Permission requests that exceed stated purpose
Obfuscated payloads and hidden instructions

Claw Patrol catches what happens at runtime:

Skills that pass static analysis but behave differently in production
Legitimate skills that are exploited via prompt injection at runtime
Credential misuse and unauthorized API access
Scope creep — skills that gradually expand their access patterns over time

The Docker blog’s analysis of agent tool chain security makes the case for a third layer — container isolation — arguing that even firewalled agents should run in sandboxed environments. Combined with our earlier coverage of sandboxing at the code execution layer, the full stack looks like:

Install-time scanning (SkillSpector): Reject known-bad skills before they enter your environment
Runtime firewalling (Claw Patrol): Enforce least-privilege at the network level
Container isolation (Docker, Tailscale sandboxes): Limit blast radius if both layers are bypassed

No single layer is sufficient. The SkillSieve paper formalizes this as “multi-layer vetting” and demonstrates that combining static and dynamic analysis catches 3.2x more issues than either alone.

The Broader Ecosystem Response

SkillSpector and Claw Patrol aren’t operating in a vacuum. The agent security ecosystem is responding across multiple fronts:

Cisco AI Defense released their own skill scanner targeting enterprise deployments with compliance-oriented reporting
OWASP published the Agentic Skills Top 10, giving the industry a shared vulnerability taxonomy
Snyk open-sourced toxicskills-goof, an intentionally vulnerable skill collection for testing scanners — the “DVWA of agent skills”
The SkillSieve paper proposes a formal framework for automated multi-layer skill vetting at scale

The convergence is significant: in the span of three weeks, the agent skills ecosystem went from “no purpose-built security tooling” to “multiple competing approaches from major vendors.” The trust infrastructure is catching up to the skill-shipping velocity — but there’s still a gap, and teams adopting skills today need to act on what’s available now.

What to Do Monday Morning

If your team is installing third-party agent skills — and if you’re using Claude Code, Cursor, or any MCP-equipped harness, you almost certainly are — here’s the practical checklist:

Immediate (this week):

Run SkillSpector against every third-party skill in your current environment. The scan takes seconds and the risk score gives you triage priority.
Audit your installed MCP servers. Each one is a process that launches with your agent’s permissions. Do you know what every one does?
Review the OWASP Agentic Skills Top 10 — it’s the checklist your security team will ask about.

Short-term (this month):

Add SkillSpector to your CI/CD pipeline as a gate for any PR that adds or modifies agent skills
Deploy Claw Patrol in run mode around your most sensitive agent workloads — start with anything that touches production databases or Kubernetes clusters
Establish a skill allowlist. Not every agent needs every skill. The principle of least privilege applies.

Medium-term (this quarter):

Move to Claw Patrol’s gateway mode for centralized policy enforcement across all agent traffic
Evaluate NVIDIA’s Verified Agent Skills catalog for skills that come pre-vetted
Build internal skill review processes modeled on the Verified Agent Skills pattern: automated scan + human review + signing + documentation

The skills gold rush isn’t slowing down. But the trust model is finally catching up. The tools exist. The question is whether your team adopts them before or after an incident forces the issue.

For more on agent security, see our coverage of the agent config supply chain attack surface, sandboxing untrusted agent code, and supply chain attacks on AI agent infrastructure.