Five Billion Tokens, Thirty Agents, One Engineer: What OpenClaw Reveals About Real Constraints in Agent Infrastructure

Five Billion Tokens, Thirty Agents, One Engineer: What OpenClaw Reveals About Real Constraints in Agent Infrastructure

Based on conversations with Vincent Koc, core OpenClaw maintainer, at the AWS OpenClaw Hack on March 25, 2026 && recent public statements by key core maintainers


The Anomaly

One engineer. Thirty agents. Five billion tokens consumed in a single day.

Not a benchmark. Not a controlled experiment. Not a research lab flex. A volunteer maintainer of an open source project, sitting at a laptop, orchestrating thirty concurrent agents through a codebase refactor that touched 80% of 1.5 million lines of code. Some agents shipped fixes. Others ran tests. Others cleaned up the mess left behind by the first wave. A factory line, as the engineer described it. Not a metaphor. An operational reality.

The instinct is to focus on scale. Five billion tokens is a staggering number. Thirty agents running simultaneously sounds like a coordination nightmare that should collapse under its own complexity. The natural question is: how did the system not fall apart?

But that question misses the point. The system did not fall apart because it was never designed around capability. It was designed around coherence. And the distance between those two design philosophies is where the real story lives.

Vincent Koc & colleagues


The Architecture That Made It Possible

Vincent Koc, lead AI research engineer at Comet.com & one of a small group of volunteer core maintainers on OpenClaw, had a day. 🦄 Peter Steinberger, who originally architected the project less than six months ago, now quietly works at OpenAI while stewarding the system he built.

The relationship matters because it frames everything that follows: this is not a funded team with a roadmap. This is a handful of engineers shipping infrastructure at a pace that would embarrass most funded teams, driven by necessity and the feedback loops of a growing open source community.

The release that preceded the hackathon took seven days to ship. In that window, the team rewrote roughly a million lines of code and restructured the entire codebase into a plugin architecture. The previous system, as Koc described it to the hackathon audience, was a “vibe-coded spaghetti mess.” Plugins injected code wherever they wanted. There was no single entry point. No isolation. No contracts governing how external code interacted with the core.

That architecture worked when one person ran one agent. It stopped working the moment anyone tried to scale.

The refactor was not a feature addition. It was a containment strategy. Every plugin now enters the codebase through a single path, a single interface. Every provider must conform to a contract that specifies exactly how it interacts with the system. The team verified those contracts by testing the top five or six plugins against the new architecture, maintaining a compatibility layer for older behaviors while signaling eventual deprecation.

The language Koc used on stage matters here. He did not describe the refactor as “cleaning up tech debt” or “improving developer experience.” He described it as creating contracts. The distinction is architectural. Contracts are not about making code prettier. Contracts establish boundaries that the system itself can reason about. When an agent operates within a plugin architecture governed by explicit contracts, the agent knows what it can and cannot touch. Its scope is visible not just to the engineer but to the system.

This is the infrastructure insight that most teams building multi-agent systems have not internalized: boundaries are not constraints on agent capability. Boundaries are the mechanism through which agents achieve coordination. Remove them, and agents become mob behavior. A crowd of capable processes with no way to avoid stepping on each other.

The refactor also introduced something subtler. By structuring the codebase into plugins with explicit entry points, the team created a surface area that agents could inspect. An agent operating inside OpenClaw can now see where it is, what it has access to, what it does not have access to. This transparency is not a debugging tool. It is an operating condition. Agents that can read their own constraints behave differently from agents that cannot.

Koc’s experience during the five-billion-token day illustrates this directly. The thirty agents were not running against a monolithic codebase hoping for the best. Each agent had a configured area of responsibility. Some focused on proxy-related issues. Some updated documentation. Some ran as what Koc called “dumb” agents, performing a single repetitive task with high reliability. The diversity of agent roles was not a sign of sophisticated orchestration. It was a sign of sophisticated scoping. Each agent knew its boundary. The engineer’s job was not to coordinate them. It was to assign them to boundaries and let the infrastructure handle isolation.

When asked how he avoided conflicts across 40 to 50 concurrent branches, Koc’s answer was disarmingly simple: “They just figure it out. I don’t give a shit.” This is not recklessness. It is confidence in the containment architecture. When agents operate within well-defined boundaries on distinct areas of the codebase, the conflict surface shrinks to a manageable size. The few conflicts that do emerge are trivial compared to the throughput gained.

The factory metaphor kept recurring throughout the conversation. Not because Koc was reaching for a compelling image but because the operational pattern genuinely resembled a production line. Some stations shipped code. Others tested it. Others cleaned up failures. The engineer sat on top, not directing individual agents but managing the factory layout itself. Deciding which stations existed, what each station’s scope covered, and what happened when a station produced defective output.

This is a fundamentally different relationship between human and agent than the prevailing model of “human in the loop.” The human is not reviewing every decision. The human is designing the factory. The agents are the stations. The infrastructure is the conveyor belt.


The Moment the Real Constraint Became Visible

During the hackathon’s Q&A session, someone asked a question that shifted the entire conversation: “Can you talk about how OpenClaw can be used to recursively improve code?”

Koc’s answer did not go where most people expected. He did not describe a loop where an agent writes code, tests it, finds errors, and rewrites. That kind of recursion is mechanical. What Koc described was something stranger and more significant.

The agent, he explained, is aware that it is inside of a harness. And it has the ability to change that harness.

This means the agent can inspect its own configuration. It can modify its own memory. It can spin off sub-agents. It can rerun its own code. If the agent encounters a task it does not know how to perform, it can go build a skill to handle that task, then use the skill it just built. The recursion is not happening at the code level. It is happening at the infrastructure level. The agent is not just operating within constraints. It is negotiating with constraints.

Koc’s description of his own workflow made this concrete. He described a moment where a proxy-related bug surfaced. Rather than fixing the single issue, he instructed his agents to pull every PR and issue related to proxy handling. The agents found 30 to 40 related items. From there, Koc formulated a strategy: centralize all proxy logic, update every plugin that depended on it, and ship the whole thing as a unified change. The agents executed the strategy. But the strategy itself emerged from the agent’s ability to survey the problem space, which only worked because the infrastructure made that problem space legible.

This is the pattern that gets lost in conversations about “agentic AI.” The interesting behavior is not the agent’s reasoning capability. It is the agent’s ability to read and respond to the infrastructure it operates within. An agent that cannot see its own constraints cannot negotiate with them. An agent that can see its constraints can do something far more powerful than follow instructions. It can adapt its approach based on what the infrastructure reveals about the problem.

Koc was explicit about the limits of this pattern. “People are overcomplicating it,” he said. “The harnesses are really good. The models are really good. They just take time.” The implication is that the bottleneck is not agent intelligence or model capability. The bottleneck is the time required for agents to work through complex problem spaces. And the factor that most reduces that time is not a faster model. It is a more legible infrastructure.

Then a second question surfaced the deeper constraint.

Someone asked about the relationship between workplace agents and personal agents. If an engineer runs OpenClaw at work and also runs a personal instance at home, how do those agents communicate? How do they share context safely? How does trust work across different permission boundaries?

Koc’s answer was direct: “We haven’t solved that yet.”

He pointed to the Agent Communication Protocol as a potential direction, comparing it to a phone system where agents can call each other through defined channels. But the problem, he acknowledged, goes beyond communication plumbing. It is a trust problem. When agents operate within a single boundary, you can design contracts and enforce isolation. When agents need to communicate across boundaries, with different permission models, different risk profiles, different data sensitivity levels, contracts alone are not sufficient. Something needs to verify that the agent on the other side of the boundary is operating under conditions you can trust.

This is where the conversation stopped being about OpenClaw and became about the field.


Capability Is Not the Constraint. Coherence Is.

Most teams building multi-agent systems optimize for capability first. They select the most powerful models, give agents access to the broadest possible tool sets, and architect for maximum flexibility. Coherence, when it gets attention at all, is treated as an emergent property. If the agents are smart enough and the tools are good enough, coordination will happen.

OpenClaw optimized in the opposite direction. Coherence first. Capability flows from coherence.

The plugin architecture is a coherence mechanism. The contract system is a coherence mechanism. The session isolation that gives each agent its own namespace is a coherence mechanism. The factory line workflow, where agents have distinct stations with distinct scopes, is a coherence mechanism. None of these features make agents smarter. All of them make agents safer to run at scale.

The five-billion-token day did not happen because the models got better. It happened because the infrastructure made it possible for one engineer to trust thirty agents operating simultaneously. Trust, in this context, is not a feeling. It is an architectural property. The engineer trusted the system because the system made its boundaries visible and enforceable.

This inversion maps onto a pattern visible across every domain where autonomous systems scale. Air traffic control does not work because planes are capable. It works because the coordination infrastructure makes conflicts structurally unlikely. Financial clearing systems do not succeed because transactions are smart. They succeed because settlement protocols eliminate categories of failure before they can occur.

Agent infrastructure is entering the same phase. The question is no longer “can the agent do X?” The question is “can the infrastructure prevent agents from breaking each other while they do X?” And the answer depends entirely on whether the system was designed for coherence or designed for capability and hoping coherence would follow.


What Gets Exposed

There is a pattern among operators who reach the scale Koc operates at. They stop talking about models. They stop talking about benchmarks. They start talking about infrastructure as the product.

Koc’s description of maintaining 1.5 million lines of code carried a revealing aside: “It works. There’s some bugs every now and then. But it works.” When pressed about whether the codebase needed refactoring, his response was pragmatic. Refactoring was not driven by code quality concerns. It was driven by the volume of external contributions. The pull request queue had grown to the point where the old architecture could not absorb changes safely. The refactor happened because the system needed to scale its ability to accept contributions without breaking.

This is the absorption gap in microcosm. The constraint was never what the codebase could do. The constraint was how much change the codebase could safely absorb. And the solution was not to make the codebase more capable. The solution was to make the codebase more coherent, more bounded, more explicit about what could change and what could not.

The same pattern holds for agent systems at scale. The constraint is not what agents can accomplish. The constraint is how much autonomous activity the infrastructure can absorb without losing coherence. And the teams that solve for absorption capacity, rather than agent capability, are the teams that reach the five-billion-token day.

Operators rarely articulate this. Koc did not frame his hackathon talk as a thesis about coherence architecture. He described what he built, what broke, and what he fixed. But the pattern underneath those descriptions is consistent: every significant improvement in throughput came from improving boundaries, not improving agents.


Technical Appendix: Reflexivity Patterns for Agent Architects

For builders designing multi-agent systems, the OpenClaw architecture surfaces several patterns worth examining in detail.

Constraint-Aware Reflection. The agent’s ability to inspect its own harness is not a debugging feature. It is a design principle. When an agent can read its own configuration, memory, and skill set, it can make decisions about its approach that account for what it does and does not have access to. This changes the failure mode from “agent tries something it cannot do and produces garbage” to “agent recognizes it cannot do X and either builds a skill for X or escalates.” The infrastructure requirement is transparency: the agent’s operating environment must be legible to the agent itself.

Session Isolation as Coordination Primitive. OpenClaw’s namespace system gives each agent its own session context. This is not just a data hygiene measure. It is the mechanism that enables multi-agent coordination without explicit communication between agents. When each agent has its own namespace, agents cannot corrupt each other’s state. This means the coordination tax drops to near zero for agents working on independent areas of a problem space. The engineer’s job is to define areas correctly, not to manage inter-agent communication.

Contract-Driven Plugin Boundaries. The provider contract system establishes a verifiable interface between plugins and the core. This means the system can test plugin compliance programmatically, reducing the manual review burden as the plugin ecosystem grows. For multi-agent architectures, the implication is that agent capabilities should be exposed through contract-compliant interfaces, not through open-ended tool access. Contracts constrain what agents can do, but they also guarantee that what agents do will be compatible with the rest of the system.

Factory Line Decomposition. Koc’s operational pattern of assigning agents to distinct stations with distinct scopes is a decomposition strategy, not an orchestration strategy. The difference matters. Orchestration implies a central coordinator managing agent interactions. Decomposition implies a structure where agents do not need to interact because their scopes are designed to be independent. The coordination happens at the design layer, before agents start running.

The Tolerance Threshold. Koc described a practical heuristic: “What’s the slowest station? Get it out.” In the factory line model, agents that slow down the overall pipeline are removed or reconfigured rather than optimized. This treats agent performance as a system property rather than an individual property. The question is not “is this agent good enough?” The question is “does this agent’s throughput match the pipeline’s needs?”

Original photo by Anna Chope, Graphic Designer, FriendliAI (used with permission) – background image by author


The Open Question

Everything described above operates within a single boundary. One engineer. One codebase. One permission model. One trust framework.

The moment someone introduced a second boundary, the unsolved problem became visible.

If a workplace agent and a personal agent need to share context, what verifies that the sharing is safe? If two agents governed by different permission models need to coordinate, who decides what each agent can see? If an agent inside a corporate boundary produces a result that an agent outside that boundary needs, what protocol governs the transfer?

These are not technical questions in the narrow sense. They are trust questions. And trust at the protocol level requires infrastructure that does not yet exist. The Agent Communication Protocol is a direction, not a solution. Communication between agents across boundaries requires more than a channel. It requires a verification layer that can confirm each agent is operating under conditions the other agent’s boundary considers acceptable.

OpenClaw demonstrated that within a single boundary, coherence can scale dramatically. One engineer, thirty agents, five billion tokens. The infrastructure held. The question now is what happens when the boundary itself becomes multiple. When agents inside different organizations, different permission models, different risk profiles need to work together.

The field has solved the capability problem. The coherence problem, within a boundary, has a working solution. The trust problem, across boundaries, does not.

That is where the next infrastructure gets built.


P.S. The tone of this piece – grounded in operator discovery & constraint-aware systems — reflects the spirit in which Peter Steinberger has consistently framed OpenClaw: as a project meant to be 🪀🛠️_fun to build_. The infrastructure tensions, the 5B token day, the recursive agent patterns…all of these are explored in that spirit. None of this is meant to denigrate the profound accomplishments of Vincent Koc, Peter & the core maintainers, nor the contributions of the broader OpenClaw community, sponsors, & contributors who’ve made the project possible.

P.P.S. Gratitude 🙏🏼 to the event producers & sponsors who made this conversation happen: HackerSquad’s Adam Chan, Contextual AI, Redis, Civic, Apify, & FriendliAI. The infrastructure that enables these kinds of real-time discoveries, where operators can sit down & articulate what they’re learning matters. And to that 🪄 ✨ magical hobnob with Vincent Koc + Philipp Berner, Steven Echtman & Felipe Salinas Rangel.

bsky.app/profile/schwentker.bsky.social/post/3mhzjpkws7k2m

twitter.com/schwentker/status/2037422879181344799

← Field Notes