Commissioned, Curated and Published by Russ. Researched and written with AI.


The Darwin Gödel Machine proved that open-ended self-improvement wasn’t just theoretical. Agents could generate modified versions of themselves, evaluate whether those modifications improved performance, and iterate. The catch was domain-specific: DGM worked because both task performance and self-modification were coding problems. Write code to evaluate code, generate code to fix code. The alignment was built-in by construction. Move to robotics, or academic paper review, or math grading – and that alignment disappears.

Meta AI Research’s HyperAgents paper (arXiv:2603.19461, submitted 19 March 2026) addresses this directly with a single architectural change: make the meta-level modification procedure itself editable.

The Architecture

In DGM, the meta agent – the component responsible for generating improvements – was fixed. Handcrafted. You could improve the task agent all you wanted, but the improvement mechanism itself didn’t change. That’s a ceiling. HyperAgents removes it by merging the task agent and the meta agent into a single editable program.

The paper calls this metacognitive self-modification. The agent doesn’t just search for better task solutions; it can rewrite how it searches for task solutions. As the paper’s abstract states, agents that “do not merely search for better solutions, but continually improve their search for how to improve.”

This eliminates what the researchers call the domain-specific alignment assumption. In DGM, the meta agent needed to be competent at the same kind of task as the object-level agent – both had to be coding. In HyperAgents, that requirement is gone. Any computable program that includes foundation model calls and external tools can serve as the target system. The meta agent and task agent are the same code.

The obvious question is: who modifies the modifier? This is the classic infinite regress problem – add a meta-meta-agent and you’ve only shifted the boundary upward. HyperAgents sidesteps it by collapsing the hierarchy. There’s no separate meta-meta layer because the meta agent is part of the editable program. It can modify itself.

What the Results Actually Show

The paper tests the DGM-Hyperagent (DGM-H) framework across four domains: coding, paper review, robotics reward design, and Olympiad-level math grading.

Robotics reward design is the interesting case. Agents had to design Python reward functions to train a quadruped robot in a physics simulator. During training they optimised for walking forward; at test time they had to generate reward functions for maximising torso height – a held-out task they hadn’t seen. DGM-H improved from an initial score of 0.060 to 0.372. More significant than the number: the system discovered jumping as a strategy rather than optimising the local optimum of simply standing tall. That’s a qualitative difference in solution type, not just a score increment.

In paper review, performance improved from 0.0 to 0.710. What the system developed was multi-stage evaluation pipelines with explicit checklists and decision rules – moving away from surface-level behavioural instructions toward something more structured. Without that being specified.

The transfer results are the strongest evidence for the paper’s central claim. Hyperagents optimised on paper review and robotics were transferred to Olympiad-level math grading. The benchmark used was imp@50 – performance gain achieved over 50 modification steps with a fixed meta agent. Human-customised DGM runs failed to generate improvements in the new domain (imp@50 = 0.0). The transferred DGM-H hyperagents reached imp@50 = 0.630.

The improvement strategy itself transferred, not just task-specific knowledge. That’s a different category of result. The paper doesn’t just show a system that got better at individual tasks; it shows that the process of improving became portable across domains.

What Emerged Without Being Asked

Three things the system developed unprompted stand out:

Performance tracking: classes that log metrics across generations, identifying which modifications led to sustained gains versus regressions.

Persistent memory: timestamped storage for synthesised insights and causal hypotheses, so later generations could build on earlier discoveries rather than rediscovering them.

Compute-aware planning: logic to adjust modification strategy based on remaining experiment budget, prioritising architectural changes early and conservative refinements late.

None of this was in the specification. The system determined these engineering practices were useful to its own improvement process and implemented them. That’s a different kind of behaviour than a system that gets better at a task with more compute. HyperAgents is building infrastructure for its own future operation.

This is the “improving the improver” architecture in practice. Meta-level improvements accumulate across runs and transfer across domains. A system that developed better self-modification strategies in one domain retained those strategies when moved to an entirely different one.

The Safety Question

The GitHub repository (https://github.com/facebookresearch/HyperAgents) includes this in its README: “This repository involves executing untrusted, model-generated code. We strongly advise users to be aware of the associated safety risks.”

That’s understated. At the task level, code execution risks are familiar: sandboxing, resource limits, timeouts. The engineering toolbox for constraining code execution is mature. At the meta level, the risks are structurally different. The system isn’t generating code to solve a problem – it’s generating code that rewrites how it generates code. A modification that introduces a subtle error in the improvement loop doesn’t just produce one bad output; it changes the trajectory of all subsequent iterations.

This is the core failure mode the field hasn’t fully worked out. You can evaluate task performance fairly directly – does the robot walk? Did the paper review match expert judgement? But evaluating whether a meta-level modification to the improvement procedure is sound is harder. The feedback loop is longer, effects compound across generations, and because the meta agent can modify itself, there’s no stable baseline to compare against.

Executing model-generated code that then modifies the system doing the executing is a meaningful escalation from standard agentic risk. The established safety guidance for agentic systems – minimal footprint, reversible actions, human oversight for consequential steps – becomes harder to apply when the system is rewriting its own modification logic between iterations. The action isn’t “the agent did X task”; it’s “the agent changed how it will do all future tasks.”

“Any Computable Task” – What the Claim Actually Means

The paper’s framing around “any computable task” is doing some work that deserves scrutiny.

Turing-completeness of the substrate doesn’t mean the system will actually improve on arbitrary tasks. What the results demonstrate is that DGM-H improves over baselines across four diverse domains, and that meta-level improvements transfer between them. That’s significant. But “any computable task” is a theoretical bound on the architecture, not a demonstrated result across the full space.

What the claim actually means in practice is closer to: the architecture doesn’t impose the domain-specific alignment constraint that DGM required. It’s not claiming universal convergence to optimal self-improvement. It’s claiming the approach doesn’t artificially restrict itself to coding-adjacent domains.

That’s still a meaningful step. The prior constraint – that you needed domain alignment between task and meta – was a genuine engineering limitation with real consequences. HyperAgents removes it. Whether it generalises to every computable task in practice is an open question that four benchmarks can’t settle.

Where This Goes

The paper is theoretical infrastructure. Results in controlled benchmarks – coding, robotics reward design, paper review, math grading – are promising, but the gap between controlled and deployed is large, especially for a system that modifies its own modification logic.

The practical question for engineering teams isn’t whether to deploy this today. It’s whether meta-level self-modification stability can be verified. DGM-H shows that improvements accumulate and transfer, which is what you want. It also means that errors accumulate and transfer. The system that developed persistent memory and compute-aware planning could equally develop modification heuristics that systematically bias toward superficially impressive but fragile solutions – and that drift might not be visible at task-evaluation time.

The next thing to build is evaluation infrastructure for the meta-level. Not just: did the system improve on task X? But: is the modification strategy the system developed actually sound? Right now there isn’t a clean answer to that question. The research team acknowledges the safety risks. The field needs tooling to make those risks measurable before this architecture moves closer to deployment.

Paper: arXiv:2603.19461 Code: github.com/facebookresearch/HyperAgents