Deepmind
- Breaking Tasks into Milestones: DeepMind's Fix for Long-Horizon Agent Failure
Long-horizon LLM agents fail in predictable ways: they loop, drift, and lose the thread. A new Google DeepMind paper proposes subgoal decomposition at inference time combined with milestone-based RL rewards, and the numbers are striking.