Loop Engineering: The Week the Industry Stopped Prompting
In June 2026 the industry coined 'loop engineering' — designing the autonomous loop that prompts your agent instead of prompting it yourself. Here's what it actually is, where it came from, how to build one, and where it breaks.

In the second week of June 2026, a sentence from inside Anthropic went around the industry. Boris Cherny, who built Claude Code, told an interviewer: “I don’t prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops.” The line was striking enough that The New Stack ran it as a headline — the person who built one of the most-used coding agents on earth had stopped using it the way the rest of us do.
Two days later, developer Peter Steinberger compressed the same idea into a post that did roughly 6.5 million views:
“you shouldn’t be prompting coding agents anymore. You should be designing loops that prompt your agents.”
By June 7, Google engineer Addy Osmani had given the pattern a name and a shape in a post titled, simply, Loop Engineering. Within the same week, Latent Space was cataloguing it (“Loopcraft: The Art of Stacking Loops”), Firecrawl had published a skeptic’s rebuttal, and the term had crossed into mainstream coverage via Tech Times. A discipline — or at least a very loud hashtag — had been born.
It’s worth being precise about what is actually new here, because the honest answer is: less than the volume suggests, and more than the skeptics admit.
The loop was always the agent
Strip away the June 2026 branding and “loop engineering” describes something the field has understood for over a year. Back in December 2024, Anthropic’s foundational post Building Effective Agents defined the entire category in one sentence: agents “are typically just LLMs using tools based on environmental feedback in a loop.” The 763-point Hacker News thread that followed spent most of its energy arguing about exactly that — where a “workflow” ends and a “loop,” i.e. a real agent, begins.
A few months later, Anthropic’s Barry Zhang reduced the idea to runnable pseudocode in an AI Engineer talk that still circulates. An agent, stripped to the metal, is this:
env = Environment()
tools = Tools(env)
system_prompt = "Goals, constraints, and how to act"
while True:
action = llm.run(system_prompt + env.state)
env.state = tools.run(action)
Observe, act, repeat. That’s the whole machine.
The same realization arrived from every direction at once. In April 2025, Thorsten Ball built a working coding agent in about 400 lines and titled the essay How to Build an Agent, with a thesis that fit in a breath: “It’s an LLM, a loop, and enough tokens.” Simon Willison kept repeating the compact version — “Agents are models using tools in a loop” — until it became the closest thing the field has to an agreed definition. Jeremy Howard of fast.ai pushed it further, arguing we should retire the marketing word entirely:
“can we now all agree to stop saying ‘agent’ and say ‘tool loop’ instead? It’s the same # of syllables, and is much clearer.”
By August 2025, Braintrust had written the explainer with the bluntest possible title — The canonical agent architecture: A while loop with tools. So when people say loop engineering is “just” putting an LLM in a while loop, they’re right. That part isn’t new. What’s new is the claim that the loop itself is now the primary object of engineering — the thing you design, tune, test, and own — rather than an implementation detail behind a chat box.
The lineage: prompt → context → harness → loop
The clearest way to understand loop engineering is as the newest layer in a stack that has been building for three years. Each layer didn’t replace the one beneath it; it wrapped it.

Prompt engineering (2022–2024) was about wording — coaxing a single response out of a single call. Context engineering arrived in mid-2025 when Andrej Karpathy gave it a name that stuck:
“+1 for ‘context engineering’ over ‘prompt engineering’ … context engineering is the delicate art and science of filling the context window with just the right information.”
The next layer up was harness engineering — the scaffolding around a single run: the tools, the verification, the context plumbing that lets one agent invocation do serious work. Latent Space documented its most extreme form in Extreme Harness Engineering for Token Billionaires, profiling teams running a billion tokens a day with effectively no human code review.
Loop engineering sits one level higher still. The cleanest distinction in the discourse: the harness equips one run; the loop is what keeps poking the agent on a schedule and feeds itself. A loop discovers work, dispatches it, verifies the result against a rubric, persists state outside the context window, and decides whether to keep going or stop. The human moves from operator to designer — from typing prompts to architecting the system that types them.
The anatomy of a loop
Osmani’s contribution wasn’t the concept — it was the parts list. His post breaks a production loop into five composable pieces, and they map almost one-to-one onto patterns practitioners had already been building:
- Automations — scheduled triggers that discover work (a cron tick, a new issue, a failing test) and kick off a run without a human in the chair.
- Worktrees — isolated, parallel execution environments so multiple agents can work at once without colliding on the same files.
- Skills — codified, reusable project knowledge the agent loads on demand instead of re-deriving every run.
- Plugins / connectors — integration with real tools and external systems, typically over MCP.
- Sub-agents — decomposition of a big goal, and crucially the separation of the agent that does the work from the agent that checks it.
The throughline of all five is control. That is exactly the argument of 12-Factor Agents, Dex Horthy’s framework that has collected over 23,000 GitHub stars. Its most-cited principle, Factor 8 — “Own your control flow” — is the whole loop-engineering thesis stated a year early: don’t hand a model a bag of tools and a vague goal and hope; engineer the loop yourself, step by step, until it reaches a state you define as “done.” A wave of open-source harness builders now exists to make exactly that wiring repeatable.
The most sophisticated loops add an adversary. In Anthropic’s workshop on building agents that run for hours, the engineers describe a planner-generator-evaluator harness — one component proposes, one executes, and a third adversarially verifies — and report it dramatically outperforms a single agent looping on its own. Verification stops being an afterthought and becomes a first-class part of the control structure.
Why it works now
If the loop is so old, why did it suddenly work well enough to inspire a movement in 2026? Three reasons.
Models got good enough to brute-force. Simon Willison’s framing in Designing agentic loops is the key insight: “One way to think about coding agents is that they are brute force tools for finding solutions to coding problems.” When each iteration has a decent chance of making progress and the loop can run unattended, brute force becomes a viable strategy. The loop tries, observes the failure, and tries again — cheaply, overnight, while you sleep.
Fresh context beats accumulated context. The most reductive loop in the wild is Geoffrey Huntley’s “Ralph,” which is, almost unbelievably, a bash one-liner:
while :; do cat PROMPT.md | claude-code ; done
Huntley’s everything is a ralph loop explains the trick: each iteration spins up a fresh context window and runs exactly one task, deliberately throwing away the accumulated state that would otherwise rot. Counterintuitively, forgetting on purpose makes the loop more reliable.
Stacking loops compounds leverage. Latent Space’s Loopcraft frames the emerging craft as knowing when to descend a loop (for reliability) versus ascending to higher-level loops (for leverage) as models improve — captured in the slogan “UP for leverage, DOWN for reliability.” A loop that supervises loops is how one engineer starts to operate like a team, and it’s why agent orchestration tooling has become its own category.
Where it breaks
This is the section the hype cycle skips, and it’s the one that matters most. Loops don’t just amplify your throughput — they amplify everything, including your mistakes.
Context rots. Dex Horthy has described a “dumb zone”: push past roughly 40% of the context window and an agent’s signal-to-noise degrades, broken assumptions contaminate later reasoning, and recovery gets hard. It’s the technical reason Ralph throws context away every iteration.
Cost runs away. This is the failure mode people are actually getting burned by. Firecrawl’s critique puts it sharply: an unguarded loop “isn’t a loop. It’s an open invoice.” Their reporting notes that Uber capped its engineers at $1,500 per person, per tool, per month after burning through its annual AI budget in four months, with developers trading screenshots of loops that “quietly chewed through hundreds of dollars overnight.” A loop with no budget ceiling is a financial liability, not a productivity tool — which is why usage observability and budget controls have become non-optional.
Autonomy is a security surface. Simon Willison has been the loudest responsible voice here, and the Hacker News thread on his agentic-loops piece is blunt: “Prompt injected agents WILL be able to escape containers 100%. VMs are the way.” An agent that runs unattended, reads untrusted input, and has tool access is a prompt-injection and data-exfiltration risk that scales with its autonomy — one more reason to take the security risks of autonomous agents seriously before you let a loop off the leash.
Comprehension debt accrues. Osmani’s own term for the subtlest cost: the faster a loop ships code you didn’t write, the wider the gap between what’s in your repo and what you actually understand. As Firecrawl puts it, “the loop lets you be wrong at machine speed.”
And then there’s the sharpest skeptic line, the one that went around Reddit and into Firecrawl’s piece: “it’s a cron job wearing a hat.” It deserves a real answer. The rebuttal is that a true loop is state-conditioned — it observes a result, judges it against a rubric, and decides whether to continue or stop — whereas cron blindly runs a fixed path on a timer. That distinction is real. But it’s also exactly the part most “loops” skip, which is why the jab lands: a scheduler with no evaluation step really is just cron in a costume.
The honest verdict
So is loop engineering a discipline or a buzzword? Both, depending on which claim you’re evaluating.
The practice is real and accelerating. Owning your control flow, separating generation from verification, resetting context, capping budgets, running agents on a schedule — these are concrete, battle-tested techniques, and the best teams were doing them before the term existed. Tooling is already crystallizing around the name, down to CLIs that audit loop cost.
The term’s durability is unproven. The substance predates the label by a year (Braintrust was writing “a while loop with tools” in August 2025), and a chunk of June 2026’s volume came from SEO content farms racing to rank for a hot keyword. “Loop engineering” may well settle in as the accepted name for this layer — or it may fade as the next framing arrives.
And the frontier is still brittle. This is the uncomfortable part: agents remain least reliable on exactly the long-horizon, economically valuable tasks that loops are meant to conquer. As we covered in why AI agents still fail the majority of real jobs, even loop advocates concede that on the hardest tier of real work, success rates can still round to zero. Loop engineering is a genuine answer to “how do I get more out of a capable agent?” It is not yet an answer to “how do I make an unreliable agent reliable?” — and pretending otherwise is how you end up with an open invoice and a pile of comprehension debt.
Building a loop without getting burned
If you want to actually do this, the sources converge on a short, opinionated checklist:
- Cap the budget first. Set a hard ceiling per loop before you set a goal. An autonomous loop without a spend limit is the single most expensive mistake in this space.
- Sandbox it. Run loops in a disposable VM or container with least-privilege tool access, especially if they touch untrusted input. Assume prompt injection will happen.
- Separate doing from checking. Use a distinct verifier — a sub-agent, a test suite, a rubric — because, in Firecrawl’s words, “‘Done’ is a claim, not a proof.”
- Keep context fresh. One task per iteration, reset aggressively, persist real state to disk rather than letting it pile up in the window. Forgetting is a feature.
- Keep a human in the loop, by design. 12-Factor Agents’ Factor 7 — “Contact humans with tool calls” — exists precisely so the loop can escalate when it’s unsure instead of confidently doing the wrong thing 40 times.
- Read the diffs. The fastest way to pay down comprehension debt is to refuse to accrue it.
The deeper point underneath all of it is the one Osmani closes on, and it’s the line worth keeping: “The loop changes the work, it does not delete you from it.” The engineer who stops prompting and starts designing loops hasn’t automated themselves out of a job. They’ve taken a promotion — from typing the instructions to architecting the system that does. Whether that system ships value or quietly burns money overnight comes down to the same thing it always has: the quality of the engineering you put into the loop.
Sources & further reading: Anthropic — Building Effective Agents · 12-Factor Agents · Thorsten Ball — How to Build an Agent · Geoffrey Huntley — everything is a ralph loop · Simon Willison — Designing agentic loops · Braintrust — A while loop with tools · Addy Osmani — Loop Engineering · The New Stack — Loop Engineering · Latent Space — Loopcraft · Firecrawl — Loop Engineering: Should You Stop Prompting Agents?