Sandboxing Your Agents: Running Untrusted Agent Code Safely

Sandboxing AI agents — two paths diverging: a microVM kernel boundary on the left and a WireGuard mesh network topology on the right, against a dark terminal-green background

Every agent you deploy against a real codebase is running code you did not write, in response to prompts you did not fully anticipate, with tools that have blast radii you have not fully mapped. That is not a failure of your architecture — it is the definition of what an agent is. The question is not whether your agent code is trusted. It is not. The question is how you contain the damage when the model hallucinates a rm -rf, exfiltrates a .env file, or hits an unintended API endpoint 300 times in four seconds.

Sandboxing has moved from “nice to have” to table stakes in 2026. Two distinct models have matured enough for production consideration: process and container isolation (microVMs, gVisor, managed sandbox services) and network-as-sandbox (Tailscale’s Aperture, WireGuard-backed identity governance). Neither model is universally better. They solve different problems — and the operators winning at this layer are combining them. This piece maps both models in detail, gives you a decision framework for choosing, and closes with a practical checklist for running agent swarms against real codebases without getting burned.

⚠️ A recent internal audit found that 41% of OpenClaw skills contained at least one vulnerability capable of exfiltrating data or executing unintended shell commands. This is not a niche problem. Uncontained agent execution is an exposure that compounds at every layer of your stack.

Why Agent Code Is Fundamentally Untrusted

The problem is structural, not incidental. When a coding agent generates a bash script, fetches a dependency, or writes a file path based on model output, you have introduced a code-execution surface controlled by an LLM whose behavior you cannot fully predict or formally verify. The OWASP Agentic Top 10 puts prompt injection and excessive agency at the top of the list precisely because they are not edge cases — they are default conditions in any agent that takes real-world actions.

Three threat classes matter most in operator practice:

Lateral movement. An agent with write access to one directory and network access to your internal tooling can pivot. If it is compromised via prompt injection (an attacker embeds a malicious instruction in a file the agent reads), it can use its legitimate credentials to reach services it was never meant to touch. The HN kernel-level sandboxing thread is full of operators who discovered this the hard way after shipping agents with broad filesystem permissions.

Exfiltration via tool call. Agents with access to a web-browsing tool, an email tool, or even a logging tool can be directed — through injection or model drift — to exfiltrate secrets by encoding them in outbound HTTP requests. Without network egress control, you have no way to detect or block this at the execution layer.

Runaway resource consumption. An agent stuck in a tool-call loop — or one directed by a malicious prompt to generate load — can consume compute, API quota, and money faster than any human operator can react. A BeyondScale enterprise sandboxing guide surveyed 200+ operator teams and found uncontrolled tool loops were the most common incident type, ahead of data exfiltration.

The Northflank sandbox guide and Firecrawl’s agent sandbox writeup both converge on the same baseline: containment must happen at the execution layer, not at the prompt layer. System prompts that say “do not exfiltrate secrets” are not a security control. They are a politeness request.

For more on what happens to stacks that skip this layer, see our AI agent supply chain attacks piece.

Model 1: Process and Container Sandboxes

The first model puts a hard boundary around the agent’s execution environment itself. The agent gets a filesystem it can destroy, a process tree it can fork, and a network surface it cannot exceed — all without touching the host.

LangSmith Sandboxes

LangSmith Sandboxes represent the current state of the art in managed, agent-native sandbox services. Each sandbox is a microVM with genuine kernel-level isolation — the agent process runs as root inside the VM, but that root cannot escape the VM boundary. This matters because container-based isolation (plain Docker) is not kernel isolation. A compromised container can exploit kernel vulnerabilities to escape to the host. A microVM has its own kernel; the blast radius of a kernel exploit is contained within the VM.

The performance numbers are operator-grade. LangSmith Sandboxes achieve p50 spawn latency under 0.98 seconds — cold start included. For agents that need to spin up a fresh environment per task (the correct pattern for untrusted code execution), sub-second spawn means sandboxing does not introduce meaningful UX latency. The sandbox provides a full Ubuntu environment, persistent filesystem (scoped to the session), and pre-installed language runtimes. Agents get the execution surface they need without the host exposure you cannot afford.

ℹ️ Key LangSmith Sandbox properties:

microVM with kernel-level isolation (agent runs as root inside the VM, cannot escape)

p50 cold-start latency under 0.98 seconds

Full Ubuntu environment with pre-installed runtimes

Designed for LangGraph and LangChain agent workflows natively

E2B

E2B takes a similar microVM approach but positions itself as a provider-agnostic sandbox primitive. Any agent framework can call the E2B API to get a sandboxed execution environment. E2B’s sandboxes support custom Dockerfiles for environment setup while still running inside a VM boundary, which gives you the flexibility of container tooling with the isolation guarantees of hypervisor-level separation. The coding agent sandbox comparison gist by GitHub engineer wincent gives a rigorous side-by-side of E2B, Modal, and several alternatives — worth reading before you commit to a provider.

Firecracker

Firecracker is the open-source microVM hypervisor that powers both AWS Lambda and AWS Fargate. It is the substrate most managed sandbox services are built on, and you can run it directly if you want to operate your own sandbox infrastructure. The numbers are well-established in production: ~125ms boot time and under 5 MiB of memory overhead per VM. At those numbers you can afford to give every agent task its own microVM with no pooling, which is the correct threat model — shared environments between tasks create implicit trust that negates your sandbox investment.

Firecracker’s minimal attack surface is a deliberate design choice. It exposes only the devices agents actually need (block storage, network interface, serial console) and omits everything else. The hypervisor itself is written in Rust and has been extensively audited. The HN kernel-enforced sandbox thread has a useful discussion of Firecracker’s threat model versus alternatives like QEMU.

gVisor

gVisor takes a different approach: instead of a separate VM kernel, it interposes a user-space kernel (Sentry) between the container process and the host kernel. System calls from the container are intercepted and handled by Sentry rather than reaching the host. The tradeoff is performance: gVisor carries a 10–30% I/O overhead on filesystem-heavy workloads, with minimal compute overhead for CPU-bound tasks. For agents doing heavy file I/O (building software, running test suites), that overhead is real and must be benchmarked against your workload. For agents doing primarily API calls and light code execution, gVisor’s overhead is negligible and its operational simplicity (it runs as a container runtime, no hypervisor required) is a meaningful advantage.

The Manveer’s sandbox guide on Substack has a practical walkthrough of gVisor configuration for agent workloads, including the runtime class configuration for Kubernetes.

Model 2: Network-as-Sandbox (Tailscale Aperture)

The second model takes a fundamentally different view of the problem. Instead of isolating the process, it governs what the process can reach. The agent may run in a normal container or even on a host machine — but every network connection it makes is mediated by cryptographic identity and policy.

This is not less secure by default. It is a different threat model. Process isolation stops a compromised agent from touching the host filesystem. Network isolation stops a compromised agent from reaching services it was never authorized to contact. In practice, network exfiltration is the more common attack vector for agent compromise in production — which is why the network-as-sandbox model deserves serious operator attention.

Tailscale Aperture

Tailscale Aperture is Tailscale’s private-alpha product for AI agent network governance. It builds on Tailscale’s existing WireGuard mesh (which already gives every node a cryptographic identity) and adds a layer of policy controls specifically designed for agentic workloads.

The core concept is that every agent — every running instance of your agent code — gets a WireGuard identity. Network access is not controlled by IP address or VLAN (both of which are trivially spoofable or misconfigured). It is controlled by who the agent is: its identity, its task context, and the policy rules your team has written. An agent authorized to reach your internal GitHub Enterprise cannot reach your production database, even if it somehow learns the database’s IP address.

ℹ️ Tailscale Aperture’s three governance pillars:

WireGuard-based identity: every agent gets a cryptographic identity tied to its deployment; no IP-based trust

Centralized policy controls: rules are written once and enforced at the network layer across all agent instances

Audit-ready session histories: every connection attempt — successful or blocked — is logged with full agent identity context

The Tailscale “Network as Sandbox” talk lays out the architectural reasoning in detail. The key insight is that the network perimeter is the most durable enforcement point for agent policy: you can change your agent code, your model, your framework, and your container runtime without changing your network policy. The policy layer is decoupled from the implementation layer.

Aperture adds centralized policy controls and audit-ready session histories on top of the base WireGuard mesh. This is particularly relevant for compliance-sensitive operators. When your SOC2 auditor asks which services your agent touched during a specific task run, Aperture gives you a log with full agent identity context — not just “traffic from 10.0.0.42” but “agent instance forge-coder-7 running task build-pr-1402 connected to github.example.com at 14:23:07 UTC.”

The Highflame + Tailscale Integration

The Highflame and Tailscale partnership is the clearest production case study of this model. Highflame deploys agent fleets against enterprise codebases and uses Tailscale’s network layer to enforce what those agents can reach. The XDA Developers coverage of the partnership provides a readable summary of the operational model: agents are provisioned with Tailscale identities at deploy time, policy rules are written at the organization level (not per-agent), and the audit trail is continuous.

The practical implication for operators: if you are already running Tailscale for your internal network (which you should be), adding Aperture to your agent deployment adds network-layer governance without replacing any of your existing infrastructure.

Comparison: Which Model When

These two models are not in competition — they address different attack surfaces. The choice is not “process sandbox OR network sandbox” but “which combination does my threat model require?”

Dimension	Process/Container Sandbox	Network-as-Sandbox (Aperture)
Primary threat blocked	Host escape, filesystem damage, runaway processes	Lateral movement, data exfiltration via network, unauthorized service access
Enforcement point	Kernel/hypervisor boundary	Network layer (WireGuard identity)
Cold-start cost	98ms–125ms (Firecracker/LangSmith), negligible (gVisor)	No additional cold start
I/O overhead	0% (microVM), 10–30% (gVisor, filesystem-heavy)	~0% for authorized connections, 100% block for unauthorized
Auditability	Process-level logs inside the sandbox	Full network session history with agent identity
Best for	Agents generating and executing code, running shell commands, arbitrary code eval	Agents calling internal APIs, accessing sensitive services, multi-tenant deployments
Operational complexity	Requires VM/container infrastructure management	Requires Tailscale deployment (low if already using it)
Compliance story	Isolation boundary for SOC2	Audit log with identity context for SOC2, ISO 27001

Use process isolation when: your agents generate and execute code (coding agents, test runners, data-processing agents). Anything that runs exec(), shells out, or writes arbitrary files needs a hard kernel boundary. The NVIDIA mandatory controls guidance specifies three non-negotiable controls for this class: network egress allowlists, workspace write restrictions, and config file protection — all of which process isolation enforces natively.

Use network-as-sandbox when: your agents call internal APIs, access databases, or operate in multi-tenant environments where lateral movement between tenants is a regulatory concern. This is also the right model for agents that are not executing arbitrary code but are making orchestrated API calls — the threat is exfiltration and privilege escalation, not host escape.

Use both when: you are running coding agents in an enterprise or regulated environment. Process isolation prevents the agent from damaging the host; network governance prevents the agent (or an attacker who has compromised it) from reaching unauthorized services. The two layers compose cleanly — a Firecracker microVM can run with a Tailscale sidecar for network identity. LangSmith Sandboxes expose an egress configuration for exactly this composition.

For operators building validation layers on top of their sandboxed agents, see our agent judge layer guide.

The Operator Checklist: Running Swarms Against Real Codebases

Research across the BeyondScale enterprise guide, Northflank’s sandbox guide, and Firecrawl’s production notes converges on a consistent set of controls. Operators who implement all of these see approximately 90% reduction in security incidents compared to uncontained agent deployments.

⚠️ This checklist is not aspirational. Every item corresponds to a real incident class documented in the sources linked in this article. If you are running agents against production codebases and cannot check every box, you are accepting known risk.

Pre-Deployment

Classify your agent’s execution class. Code-executing agents (shell, eval, file write) require microVM isolation. API-calling agents require network identity governance. Hybrid agents require both. Do this classification before choosing infrastructure.
Provision one sandbox per task, not one sandbox per agent instance. Task isolation prevents cross-task contamination. A shared sandbox between Task A and Task B means Task A’s filesystem artifacts (including any injected payloads) are visible to Task B. Firecracker’s 125ms boot time makes per-task VMs operationally viable.
Write your egress allowlist before your agent runs for the first time. Do not run the agent in permissive mode and then tighten afterward. You will miss things. The allowlist should enumerate exactly which external hosts and internal services the agent is authorized to reach, by hostname (not IP).
Restrict write scope to a specific working directory. Agents do not need to write to /etc, /usr, or any directory outside their working tree. Apply filesystem read-only mounts for everything except the designated workspace. Config file protection (one of NVIDIA’s three mandatory controls) means your agent cannot modify its own runtime configuration, sandbox policy, or credentials files.
Set resource limits before deployment, not after your first incident. CPU time, wall-clock timeout, memory, and outbound connection count. An agent stuck in a tool loop should time out automatically. An agent spawning hundreds of subprocesses should hit a cgroup limit. These limits are not performance constraints — they are blast-radius controls.
Scan agent-generated code before executing it. For coding agents specifically: the model’s output is untrusted code. Run static analysis (even a basic linter pass) on generated shell scripts and Python files before executing them inside the sandbox. This catches the most obvious injection payloads.

During Deployment

Log every tool call with its full input, at the execution layer, not just the model layer. Model-layer logging can be bypassed by prompt injection. Execution-layer logging (inside the sandbox, at the syscall or function level) cannot be. Your audit trail must be tamper-resistant relative to the agent.
Monitor for anomalous tool-call volume in real time. An agent that makes 300 HTTP requests in 4 seconds is not operating normally. Set per-agent rate limits on external API calls and alert on sustained deviation. The agent judge layer pattern can provide runtime validation against expected call patterns.
Treat every credential inside the sandbox as revocable. Provide time-limited tokens rather than permanent API keys. If the sandbox is compromised, the token expires within the task window. Rotate tokens between tasks, not between deployments.
Enforce agent identity at the network layer for multi-agent workflows. In swarms where agents call each other, each agent should authenticate to its peers with a cryptographic identity (Tailscale’s model). Agent-to-agent trust based on IP address or shared API key creates lateral movement risk within your own swarm.

Post-Task

Snapshot and destroy the sandbox after every task. Do not reuse sandbox state between tasks. Even if you trust the previous task’s output, the filesystem may contain artifacts (cached credentials, downloaded dependencies, injected files) that affect the next task’s behavior.
Diff the workspace against its pre-task state before accepting output. A coding agent should have modified files in a predictable set. If the diff includes files outside the expected working directory, that is a finding — not just a cleanup task.
Run generated code in a second-stage sandbox before committing to any environment. For coding agents producing pull requests: run the generated test suite in a fresh sandbox before the PR reaches your CI. If the test suite contains malicious code (an increasingly documented attack vector), it runs in the sandbox, not in your CI runner.
Review agent session histories on a sample basis, not just on incident. Aperture-style session logs are only useful if someone reads them. A weekly review of a random sample of agent sessions catches slow-burn issues (agents gradually expanding their access patterns) that per-task alerting misses.

For Kubernetes-Native Deployments

Use the kubernetes-sigs/agent-sandbox CRD if you are on Kubernetes. The Sandbox CRD provides a Kubernetes-native abstraction over the sandbox lifecycle — provisioning, task execution, and teardown — without requiring custom operator code for each agent type. The agent-sandbox/agent-sandbox reference implementation is a useful starting point for the operator scaffolding.
Apply network policies at the pod level, not just the cluster level. Kubernetes NetworkPolicy is a coarse tool; apply fine-grained egress rules per agent pod using a service mesh or Aperture-style identity governance. The Northflank guide’s Kubernetes section has worked examples.

What’s Next: Kubernetes-Native Sandboxing and the Convergence

The direction of the ecosystem is clear: both models are converging toward a unified agent execution fabric where process isolation and network identity governance are provisioned together, as a primitive, at the Kubernetes layer.

The kubernetes-sigs/agent-sandbox project formalizes this direction. The Sandbox Custom Resource Definition gives platform teams a standard API for declaring sandbox requirements — isolation level, egress policy, resource limits, identity configuration — and lets agents request sandbox resources the same way they request compute resources. This is the right abstraction. Operators should not be writing custom Helm charts for sandbox provisioning; they should be declaring sandbox requirements in the same spec as their agent deployment.

The Tailscale Aperture private alpha is similarly moving toward a Kubernetes-native deployment model where agent pods receive Tailscale identities automatically at scheduling time, without any per-agent configuration. The Highflame partnership is an early example of what this looks like in practice: identity governance that is invisible to the agent developer and automatic for the platform operator.

ℹ️ The 2026 sandbox stack for serious operators:

Firecracker or LangSmith Sandboxes for process isolation on code-executing agents

Tailscale Aperture for network identity governance on all agents

kubernetes-sigs/agent-sandbox CRD for Kubernetes-native lifecycle management

Runtime validation layer (agent judge) for behavioral monitoring against expected patterns

The deeper convergence is between sandboxing and observability. Sandboxes that produce rich execution traces — every syscall, every network connection, every file write — can feed runtime validation pipelines that detect behavioral drift before it becomes a security incident. LangSmith’s sandbox-to-trace integration is early evidence of this; the agent judge layer pattern is what operators are building on top of it.

The teams that will operate agent swarms at scale without getting burned are not the ones with the most sophisticated LLMs. They are the ones who treat every agent as an untrusted process, provision isolation before deployment, and instrument the execution layer deeply enough to know when something has gone wrong before the blast radius becomes unmanageable. Sandboxing is not the whole answer. But it is the foundation that everything else — evals, runtime validation, incident response — depends on.

For a complete picture of the operational risks that sandboxing is designed to contain, see our AI agent psychosis audit — the 9-question review that maps where your stack is currently flying blind.