The $1B Bet Against Transformers: LeCun's World Models Thesis

16 March 2026 - 9 mins read

Commissioned, Curated and Published by Russ. Researched and written with AI.

What’s New This Week

AMI Labs announced its $1.03 billion seed round on March 10, 2026, confirming a $3.5 billion pre-money valuation and a co-investor list that includes Bezos Expeditions, Nvidia, Samsung, Toyota Ventures, and a handful of French industrial groups. The raise exceeded the €500M target the company had reportedly set in late 2025 – suggesting investor appetite for an alternative to the transformer paradigm is stronger than the funding numbers initially implied.

Changelog

Date	Summary
16 Mar 2026	Post published.

Yann LeCun raised $1.03 billion to prove the entire AI industry got it wrong. Not slightly wrong – wrong in principle.

That’s the claim behind AMI Labs (Advanced Machine Intelligence Labs), the Paris-based startup LeCun founded in November 2025 after leaving Meta, where he had served as chief AI scientist for over a decade. The seed round, announced March 10, values AMI at $3.5 billion pre-money. Four months after founding. Before a single commercial product has shipped.

The money is a signal. So is who wrote the cheques: Cathay Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions as co-leads, with Nvidia, Samsung, Toyota Ventures, Eric Schmidt, Mark Cuban, and Xavier Niel also on the cap table. These are not naive investors. They know what LeCun is arguing. They funded it anyway.

The thesis deserves a careful read.

What LeCun Says LLMs Can’t Do

Start with what LeCun is not saying. He uses LLMs. He acknowledges they are “extremely useful to a lot of people, particularly if you write text, do research, or write code.” He’s not dismissing ChatGPT or Claude Code or Gemini as tools. His argument is architectural, and it runs deeper than “they hallucinate” or “they’re not grounded.”

LLMs learn by predicting the next token. That is their fundamental operation. The model’s entire internal representation of the world – everything it “knows” – is derived from the statistical structure of text. Token prediction is good at compressing co-occurrence patterns in language. Language does encode a lot of world knowledge. But it doesn’t encode everything.

What it doesn’t encode: physical causality. How objects move through space. What happens when you drop something. Why a bridge holds load until it doesn’t. The difference between “the cup fell” and “I pushed the cup” in terms of physical consequence. LeCun’s argument is that none of this can be reliably derived from token statistics, no matter how much text you process.

The Moravec Paradox is relevant here. Hans Moravec observed in 1988 that the things which are easy for humans – perception, navigation, physical manipulation – are hard for computers, and vice versa. LLMs have inverted that somewhat: they’re very good at the linguistic and cognitive tasks that felt “hard” to earlier AI, but they remain brittle at exactly the physical and causal reasoning that humans do effortlessly.

“People have had this illusion, or delusion, that it is a matter of time until we can scale them up to having human-level intelligence,” LeCun told MIT Technology Review. “That is simply false.”

He’s been making this argument publicly for years. The difference now is that he has $1B and a team pulled from Meta FAIR, Google DeepMind, and OpenAI to act on it.

What World Models Actually Are

A world model is an AI system that maintains an internal representation of physical reality and can simulate the consequences of actions before taking them.

The chess engine comparison is instructive. A chess engine doesn’t play every possible game to learn chess. It has a model of the board state, a model of how pieces move, and the ability to simulate sequences of moves forward in time – evaluating outcomes without physically executing them. That is a world model. It understands the structure of the problem domain, not just the statistical distribution of previous games.

LLMs don’t have this. They can generate plausible descriptions of chess games because chess games appear in their training data. But they don’t have a model of what’s happening on the board. The distinction matters when the task requires planning, not just pattern-matching.

AMI’s architectural bet is on JEPA – Joint Embedding Predictive Architecture – a framework LeCun developed while at Meta. JEPA is not a generative model. It doesn’t predict pixels or tokens. Instead, it learns abstract representations of states in the world and makes predictions in that abstract space. The idea: represent what you can reliably represent, and ignore the unpredictable details.

As LeCun puts it: “The world is unpredictable. If you try to build a generative model that predicts every detail of the future, it will fail. JEPA learns the underlying rules of the world from observation, like a baby learning about gravity.”

The training signal isn’t “predict the next word.” It’s something closer to: learn representations of states such that the model can predict plausible future representations without specifying every unpredictable detail. Train on video, audio, sensor data – not just text.

The application surface this opens is different from LLMs: jet engine sensor streams, robot arm position data, industrial process monitoring, physical simulation for autonomous vehicles. These are domains where token prediction is genuinely the wrong tool. You need a model of the system, not a model of descriptions of the system.

Why $1B Found This Thesis in Early 2026

LeCun left Meta in November 2025. Four months later, AMI has more than a billion dollars. That timeline is unusual enough to warrant examination.

Two explanations, not mutually exclusive.

First: the scaling laws debate. For most of 2024 and into 2025, the prevailing assumption in the frontier AI labs was that more compute plus more data would continue to yield capability improvements. That assumption is now contested. The rate of improvement on key benchmarks has slowed. The cost of the next increment is rising. Investors who had been funding “more of the same” are looking for an alternative thesis – and world models represent the most credible technical alternative on the table, backed by a researcher with serious credentials.

Second: application surface. LLMs have largely saturated the text and code generation markets. The next frontier for AI investment is physical – robotics, embodied agents, industrial automation, autonomous systems. These are domains where LLMs have fundamental limitations. If you’re an investor with a 7-10 year horizon and you think the physical world is the next application frontier, funding the leading world model research lab makes strategic sense regardless of whether you believe LeCun’s philosophical argument about LLM limitations.

AMI’s disclosed first partner is Nabla, a digital health startup where CEO Alexandre LeBrun was previously a director. LeCun describes manufacturing applications in detail: “A world model could learn from the sensor data [of a jet engine, steel mill, or chemical factory] and predict how the system will behave.” Toyota Ventures and Samsung are on the cap table. These are not language model use cases.

The funding came after DeepSeek’s release demonstrated that capable models could be trained for far less than assumed. That development shook confidence in the “scale wins” thesis among investors and some researchers. LeCun’s contrarian position became more credible at exactly the moment AMI was raising.

The Counterargument

The transformer proponents have a coherent response, and it’s worth taking seriously.

Their position: world models are implicit in sufficiently capable LLMs. Reasoning capabilities emerge from scale. The spatial and physical understanding that appears in models like GPT-4 and Gemini Pro didn’t come from explicit physical training – it emerged from exposure to enough descriptions of physical phenomena in language. If you scale further, with better training data and architecture improvements, world understanding continues to improve.

OpenAI, Anthropic, and DeepMind are all betting on this. So is Meta, now largely focused on Llama and its ecosystem. The combined capital allocation behind the transformer paradigm is in the hundreds of billions of dollars.

LeCun’s counter is that this is wrong in principle, not just insufficient in practice. “The truly difficult part is understanding the real world,” he told MIT Technology Review. “LLMs are limited to the discrete world of text. They can’t truly reason or plan, because they lack a model of the world. They can’t predict the consequences of their actions.”

This is a falsifiable claim. Either sufficiently scaled transformers develop genuine physical reasoning, or they don’t. The next few years of benchmark results on physical reasoning tasks – robot manipulation, physical simulation, embodied navigation – will tell a significant part of the story.

The stakes are asymmetric. If LeCun is right, the current investment in transformer infrastructure doesn’t become worthless (LLMs remain useful tools), but the research direction is a dead end for the most ambitious AI goals. If he’s wrong, AMI is an expensive detour by a brilliant researcher who bet against the trend at the wrong moment.

What It Means for Engineers Building Today

Practically speaking: for the next 2-3 years, nothing changes for most engineers.

LLMs are the production-viable path. The tooling is mature, the APIs are stable, the use cases are well-understood. AMI itself says it “could take years for world models to go from theory to commercial applications.” CEO Alexandre LeBrun was explicit: “It’s not your typical applied AI startup that can release a product in three months.” The company has no plans to generate revenue in the near term.

Where this matters sooner is in specific domains.

Robotics and embodied agents. If you’re building systems that interact with the physical world – robot arms, autonomous vehicles, warehouse automation – LLMs’ limitations are already a practical constraint. World models are not a theoretical future concern; they’re the gap between current capability and what these systems need to do. Watch AMI’s published research and code (they’ve committed to open-sourcing it).

Industrial AI. Sensor data from complex physical systems – manufacturing equipment, energy infrastructure, logistics – is a domain LLMs handle poorly. AMI is explicitly targeting this. If your engineering work touches industrial monitoring, simulation, or predictive maintenance, this research direction is relevant now.

Agentic systems. LeCun’s argument about planning is directly relevant to the current AI agents conversation. “An agentic system that is supposed to take actions in the world cannot work reliably unless it has a world model to predict the consequences of its actions,” he told MIT Technology Review. Current LLM-based agents are brittle precisely because they lack this. If world models mature, they become the foundation for more reliable agentic behaviour – not a replacement for LLMs as orchestrators, but a missing layer underneath them.

For the current state of the model landscape, the short answer is: LLMs are the tool you use today. World models are the research direction worth tracking if you’re thinking 3-5 years out.

AMI’s first models will ship quietly, tested with industrial partners, evaluated on benchmarks most people haven’t heard of. The company will publish papers. The code will be open.

LeCun may be right or wrong about the fundamental architecture of intelligence. He’s been making this argument since before most of the current AI boom started, and he’s watched the industry build a multi-hundred-billion-dollar infrastructure on a paradigm he thinks is the wrong one. Now he has $1.03 billion and a team to pursue the alternative.

Either way, we’ll find out faster than anyone expected.