← Back to blog

Blog

Designing Loops Is Your Next Leverage

A warm-toned, semi-transparent person slumped in front of a screen glowing cool blue, the screen full of code and line after line of errors, one hand propping up his forehead as if endlessly feeding the mistakes back in — babysitting the machine

We think we’re “using” AI, but most of the time we’re just feeding its mistakes back to it, over and over, babysitting it. The thing that really deserves redesigning isn’t your prompt — it’s the whole business of you sitting there watching it fix things.

Twenty Minutes, and All I Did Was Copy-Paste

I write a prompt, watch the AI spit out a result, see it’s wrong, copy the error back, watch it try again, still wrong, paste it back again… By the time I snap out of it, twenty minutes are gone.

The irony is, I build AI agent products for a living and deal with models all day. And look at what I just spent twenty minutes doing — I wasn’t using AI, I was babysitting it. That pile of repetitive grunt work I meant to hand off, I just did myself, start to finish, only now with a few rounds of copy-paste in the middle.

If you’ve ever had that moment, congratulations: you’re standing right at the doorstep of this year’s hottest term — Loop Engineering.

How This Term Blew Up Overnight

On June 8 this year, a developer named Peter Steinberger (the one who built OpenClaw and has since gone to OpenAI) posted a line on X, roughly: stop prompting your coding agent — go design the loop that prompts the agent for you. Two sentences, no image, no link, and it pulled in over six million views and set the whole AI coding world arguing for a week.

Right after, Boris Cherny, who runs Claude Code over at Anthropic, dropped almost the same line: he doesn’t prompt Claude himself anymore; he keeps a pile of loops running that prompt Claude for him. “My job is writing loops.

Two heavyweight people, saying the same thing at almost the same moment. So even if you never write a line of code, this is worth understanding — because it’s changing the very shape of what “humans and AI working together” looks like.

So What Is Loop Engineering, Exactly

In one sentence:

Loop Engineering is designing a loop that lets the AI correct itself. You pin down just three things — what to do, what standard decides whether it got it right, and when to stop — and the cycle of do → self-check → revise runs until it passes, all handed off to the AI to run on its own.

I love one comparison for explaining it:

  • Prompt engineering optimizes “how do I phrase one sentence to the AI well.”
  • Loop engineering optimizes “what shape of loop do I set the AI running in, over and over.”

Here’s a rough but useful analogy. A prompt is like briefing a subordinate on a task: the clearer the better, but you still have to watch his every step. A loop is like setting him a rule — “keep at it until all the tests pass, and don’t come back to me until they do” — and then you can go get a coffee. With the first, you’re still in the room. With the second, you’ve gone home.

The seed of this loop isn’t actually new. ReAct, back in 2022 (have the model think a step, act a step, look at the result, think again), was already it. Last July someone cooked up a thing called Ralph, essentially a one-line bash loop, and used it to crank out a whole programming language for about 297 dollars. So the “loop” is old. What’s new is that we’ve dropped a thinking large model into it and made it the one calling the shots.

On one side, a person standing behind a subordinate watching every word he writes; on the other, a person kicked back with a coffee while a loop — a sticky note reading "until all tests pass" attached to it — turns on its own

What It’s Good For: The Leverage Point Just Moved Again

For the last few years, your leverage was “being good at writing prompts.” Before that, it was “being able to write code.”

Now the leverage point has moved one notch again — to “being good at designing loops.” Because once you’ve defined something as a self-correcting loop, in theory you can watch several, even a whole batch of these loops working for you at once, instead of nursing one at a time.

But I have to pump the brakes here, because too many people are hyping this.

It’s not a silver bullet. A few sobering numbers:

  • One production survey covering 306 frontline practitioners found that 68% of production agents ran fewer than 10 steps before a human stepped in. The systems that actually work are mostly “small and supervised,” not the sci-fi swarms of hundreds or thousands running autonomously.
  • A company the size of Uber reportedly burned through its entire annual AI budget in four months, and ended up having to cap each person at 1,500 dollars per tool per month. A loop with no brakes can burn money fast enough to give you a heart attack.
  • Even Addy Osmani, one of the people who first put this concept on the map, is restrained about it. His words, roughly: “I’m skeptical — you absolutely have to be very careful about token cost.”

So “one person commanding 100 AIs” — the direction may be right, but today it’s a story about the ceiling, not the present. Let’s not get swept up in the hype.

How to Use It: Two Judgment Calls, Plus a Little Trick

Okay, down to the practical part. You’ve got a job you want to hand to an AI loop — before you rush in, ask yourself two things. And at the end I’ll throw in a little trick I use every day.

Call One: Is This Job Even Suited to a Loop

I’ve boiled it down to four rough-filter questions, all in plain words:

  1. Can a machine judge right from wrong? Is there an objective standard that can say “this version works / doesn’t.” Code can (just run the tests), but “is this copy compelling enough,” “is this strategy right” — a machine can’t judge that.
  2. Is judging it cheap, and fast? This “judge right from wrong” step has to be far cheaper than redoing the work yourself, and runnable every single round. A unit test takes milliseconds, dirt cheap. But if every round needs an expert to review it by hand, the loop seizes up.
  3. Can a mistake be undone? If a bad result comes out mid-way, can you roll it back cheaply. Write wrong code on a branch, no harm done. But mass-sending emails, wiring money, dropping a production database — that’s spilled water you can’t gather back.
  4. Can it be broken into small pieces? Can the job be sliced into small steps, each small enough for the AI to think through clearly.

Four “yes”es, and the job is a loop’s paradise — but don’t just loosely say “writing code.” To be precise, it’s the part of programming that has tests to back it up: fix a bug, finish a refactor, run the tests and you know on the spot whether it worked. But the same fingers on the same keyboard — “how should this architecture be designed” — gets no such luck; there’s no judge to score it then and there. Flip it around: the moment even one of the four lands on “no,” you’d better watch out.

A warm-toned, semi-transparent person, head bowed, examining a cool-blue glowing loop cradled in his hands, as if weighing whether this job can be handed off to run on its own

Call Two: Hold the Line — Don’t Fully Automate Work That Has No “Judge”

This is the one I want to stress, because it’s the easiest to misread. A lot of people think a loop is some universal accelerator. It actually has a hard boundary:

A loop is only as good as its “judge.” A loop without a reliable judge is just a machine for producing garbage faster, and more confidently.

Let me share something of my own. On the side, I’m building an open-source galgame (visual-novel) content generator. Looping the AI to write code and hunt bugs goes smoothly, because “does it run” is a clear judge. But looping it to write a stretch of “good story” — it stalls. Because “is it any good” has no cheap, reliable machine judge at all. The AI can copy the recipe, but it can’t cook up the “sauce.”

This isn’t the AI’s fault; the task is just born missing a judge that can score it. Anything that lands in the “can’t be auto-verified” box — pure creativity, setting direction, calling strategy — you can use AI to generate a draft, but never hand it off on full autopilot.

A cool-blue loop spinning at high speed in empty air, flinging out loose, dim fragments — with no one standing by to check it

Finally: A Little Trick I Use Every Day

One last, genuinely useful one. When you want the AI to do a job for you but can’t, in the moment, articulate “what exactly do I want” —

Don’t let it turn around and pepper you with questions. Have it generate a draft directly first, and mark every spot where it’s unsure or making a call on your behalf as an “assumption.” Then you just fix the few it got wrong.

Why is this better? Because picking holes in a concrete draft is a hundred times easier than thinking it all through from scratch against a blank page. The human brain is good at “recognizing,” bad at “conjuring from nothing.” A good tool should lean into that — taking the burden of “figuring it out” off your shoulders, instead of handing you a questionnaire.

Wrapping Up

So my read on Loop Engineering is: the bar is high, but the payoff is big.

The bar is high because “building a judge for the task that’s cheap, reliable, and carries its own stopping criterion” is often genuinely hard. The payoff is big for those jobs that are “easy to judge, but used to eat your attention alive” — mountains of code review, bulk data cleaning, compliance checks — once they’re looped, they can lift you out of the grind entirely.

So why do some loops confidently burn money and churn out garbage, and why does the standard you set always get gamed? Put bluntly, it’s because Loop Engineering is more like reinforcement learning in a new set of clothes: the “standard” you set is the reward you hand the model, and it’ll fight tooth and nail to score high — even by underhanded means. I’ll take that mechanism apart in the next piece — it goes deeper, but it’s also more hardcore, and you’ll want a bit of algorithm and model background.

So if you remember one line from this piece: does the job in your hands have a cheap, reliable judge? If it does, the loop is your leverage. If it doesn’t, the prettiest loop is just producing garbage faster. Stop grinding away at how to phrase one perfect prompt, and start taking stock of the jobs in front of you — which ones can be handed to a loop that corrects itself.

One last bit of self-promotion: I’ve taken this whole judgment — what can be looped, how to build it a reliable judge — and crystallized it into an open-source skill you load into Claude / Codex, one that makes the AI an honest advisor willing to say “you can’t build a judge for this, so don’t fully automate it.” Anyone curious can grab it (and a star would be appreciated): qingqingpi/loop-engineering-skill. I’ll walk through it in detail in the next piece.


I’m Sun Xin, an AI product manager working on agents. This series records some honest thinking from my fumbling through the AI-native era. Come find me at sunxin.xin.