RLMs, Context Rot, and Recursive Orchestrators

- Published on

RLMs, Context Rot, and Recursive Orchestrators
Part of the Agent Engineering Playbook.
There is a limit to how far you can get by shoving more text into a model and hoping the reasoning stays sharp.
Vendors keep making context windows larger, and that absolutely helps. But anyone who has worked long enough with very large sessions knows the deeper problem: even when the model can technically fit the context, its performance still starts to decay. Relevant details get ignored. Local salience wins over global structure. The session becomes heavy and vaguely stale.
Alex Zhang calls this "context rot," and it is the right term for the phenomenon.
What an RLM Is Trying to Change
In Zhang's blog post and the official RLM repository, a Recursive Language Model is presented as a thin wrapper around a normal model call. Instead of always sending the full prompt and context directly into one completion, the system places the context into an environment and lets the model inspect it, manipulate it, and recursively call itself over subsets of that context.
The official repo describes this as replacing llm.completion(prompt, model) with rlm.completion(prompt, model). That is the important conceptual shift. The model no longer has to swallow the whole world at once. It can interact with the world as an external environment and recurse over it.
In the authors' implementation, the environment is a REPL. The root model can write code, inspect variables, partition large context, and launch sub-calls against smaller pieces. The context becomes something the model works over, not something it must always fully ingest.
That is a much bigger idea than it first appears.
Why This Matters for Agent Systems
Most orchestration discussions focus on task decomposition across agents. RLMs point to a second axis: context decomposition inside inference itself.
Those are related, but not the same.
A planner can break a software task into subproblems. An RLM-style system can break a massive context into subqueries and route the reasoning over smaller slices. The first problem is organizational. The second is cognitive.
That distinction matters because many agent failures that look like planning problems are really context problems. The agent is not always "bad at orchestration." Sometimes it is just thinking through a polluted or overstuffed window.
If You Came Looking for an "RLM Orchestrator"
The useful thing to copy is the architecture, not the label.
I did not find one single canonical system officially named rlm-orchestrator that you can treat as a standard product. What does exist, and what is worth studying, is the RLM approach itself: store large context outside the main prompt, let the model query or transform it through an environment, and make recursive calls part of the inference strategy.
That means a practical "RLM orchestrator" is less likely to be a branded framework than a design pattern:
- keep massive context out of the root model's hot path
- give the model structured ways to inspect and partition that context
- let it recurse on smaller slices
- record the trajectory so the process is debuggable
That is the part worth building toward.
Why This Is More Than Retrieval
It is tempting to hear all this and think: fine, so this is just fancy retrieval.
Not quite.
Retrieval systems decide what small set of context to pull into a model. RLMs give the model more freedom to decide how to inspect, partition, and recurse over the context using an environment. Zhang's write-up is explicit about this difference. The model is not just handed chunks; it can manipulate the context and call sub-queries recursively.
That matters for tasks where the decomposition is not obvious ahead of time.
If you already know exactly which documents matter, retrieval may be enough. If the model has to discover the relevant structure while solving the task, a recursive environment can become much more powerful.
Where I Think This Leads
Today, most coding agents still behave like wide-context chat systems with tool use bolted on. Some of them add subagents. Some add ledgers. Some add queues. But the inference model underneath is usually still "keep the thread going and compact when you have to."
I do not think that will be the long-term shape.
I think the systems that win on deep, long-horizon work will start combining two ideas:
First, explicit orchestration of tasks, roles, and handoffs. Second, explicit orchestration of context itself. The model should not merely receive context. It should navigate it.
That is why RLMs feel important even in their early form. They are not just another benchmark trick. They suggest a different substrate for long-running agent systems.
A Useful Standard of Skepticism
At the same time, it is worth staying honest. The RLM work is early. The results are exciting, but they come from a particular research setup, and the engineering tradeoffs are still real. Recursive systems add latency, tracing complexity, and new failure modes. Environments need to be safe. Recursive calls need budgets. Debugging can get harder before it gets easier.
So I would not read RLMs as "the answer is here." I would read them as a strong clue about the direction of the answer.
If you want to see what happens when the orchestration side of this story gets pushed very far in practice, read What Gas Town Is Really Building. Gas Town does not implement RLMs, but it is one of the clearest examples of somebody trying to industrialize multi-agent work instead of merely talking about it.