Commissioned, Curated and Published by Russ. Researched and written with AI. This is the living version of this post. View versioned snapshots in the changelog below.

Opinions are mine. No affiliate links, no sponsored entries, no one asked to be included.

This is a living document. Labs rise and fall. Researchers go quiet or pivot. I’ll update it when the landscape meaningfully shifts. Check the changelog at the bottom for what’s changed since you last read it.


What’s New This Week

6 March 2026.

Two updates today that land directly on existing coverage in this post.

OpenAI released GPT-5.4. It replaces GPT-5.2 as their flagship model. The headline additions: native computer-use capabilities (the first general-purpose OpenAI model to include this natively), a 1M token context window, and significantly more token-efficient reasoning than 5.2 – fewer tokens to solve equivalent problems, which means lower cost and faster responses. Available in ChatGPT as “GPT-5.4 Thinking” (which can show its reasoning plan upfront and let you adjust mid-response), in Codex, and in the API. The computer-use angle is the most significant capability shift – GPT-5.4 can now operate computers and carry out multi-application workflows natively, which has been a competitive gap relative to some alternatives.

Anthropic and the Department of War. Dario Amodei published a formal public statement today. The DoW sent an official supply chain risk designation letter dated March 4. Anthropic is challenging it in court. The scope is narrower than most coverage suggested: the designation applies only to Claude being used “as a direct part of contracts with the Department of War,” not to all customers who happen to have DoW contracts. The relevant statute (10 USC 3252) requires the DoW to use the least restrictive means necessary – Anthropic is leaning on that in their legal challenge. Amodei also addressed the leaked internal memo: it was written within hours of Trump’s Truth Social post announcing Anthropic’s removal from federal systems, the Secretary of War’s designation announcement, and the OpenAI-Pentagon deal announcement all landing simultaneously. He said Anthropic did not leak it and doesn’t want to escalate. The situation is still live.


Changelog

DateSummary
6 Mar 2026GPT-5.4 released (computer-use, 1M tokens). Anthropic DoW formal letter – court challenge filed.
5 Mar 2026Alibaba Qwen leadership exodus – Justin Lin resigns, task force formed.
4 Mar 2026Meta/LeCun multimodal pretraining paper (arXiv:2603.03276).
3 Mar 2026OpenAI scale numbers updated; Anthropic/OpenAI Pentagon divergence; Willison agentic engineering book
2 Mar 2026Initial publication

Labs

This section covers the organisations actually pushing capability or shaping how the industry thinks. Not every well-funded lab is worth your attention. Some have great PR and mediocre models. The ones listed below have either done something technically significant recently, made an interesting strategic move, or taken a position worth understanding.


Anthropic

The most interesting safety/capability tension in the industry, and now the lab with the clearest set of principles in practice – not just on paper.

Claude Opus 4.6 and Sonnet 4.6 remain strong models for reasoning-heavy work. Claude Code has become a serious developer tool – more than a million developers are using agentic coding tools and Claude is a significant chunk of that. Cowork extends this toward multi-agent collaboration.

The honest version of Anthropic’s story has always been complicated. They quietly dropped their Responsible Scaling Policy commitments under competitive pressure – reported by TIME, not announced by Anthropic. That still matters. But a second data point has now landed: the US Department of War designated Anthropic a “supply chain risk” after the lab refused to enable mass domestic surveillance and autonomous weapons systems. They walked away from a contract rather than build that. Most labs would have taken the money.

OpenAI signed the Pentagon deal instead. That’s a meaningful fork in the road between two labs that are often compared as though they’re interchangeable. They aren’t, and this is the clearest evidence yet of why.

Watch Anthropic because the models are genuinely good, the principles are tested rather than stated, and the safety/commercial tension is the realest version of that tension in the industry. They’re not perfect – the RSP rollback is a genuine mark against them – but they’re more interesting than the alternatives for anyone who cares about where this is going.

Where: anthropic.com, their research blog, and the Constitutional AI and RSP documents are worth reading in full if you haven’t.


Google DeepMind

The lab that actually has the broadest portfolio and is currently winning on the benchmark that matters most to most engineers: price-adjusted performance.

Gemini 3.1 Pro sits at 57 points on the Intelligence Index, above Claude Opus 4.6, and costs roughly $892 per million tokens less than its nearest equivalent. If you are choosing a frontier model for production workloads and you haven’t re-evaluated in the last month, you’re probably overpaying.

Beyond the flagship model, DeepMind’s portfolio is genuinely wide. AlphaFold continues to compound – the structural biology community has essentially rebuilt around it. Robotics and quantum research are real programs, not PR. Lyria 3 is their music generation model, which matters if your use case touches audio. WebMCP is a Chrome standard they’re pushing for model-context-protocol browser integration, which could reshape how AI tools interact with the web.

The honest caveat: Google’s product integration track record is patchy. Good research doesn’t always become good products at Google. But the research output right now is strong enough that DeepMind earns a close watch regardless.

Where: deepmind.google, Google AI blog, their arXiv preprints.


OpenAI

The highest-profile lab, the most widely deployed, and as of March 2026, unambiguously the largest: 900 million weekly active users, 50 million consumer subscribers, 9 million paying businesses, 1.6 million weekly Codex developers (up 3x since January). $110B raised at a $730B valuation, with Amazon ($50B), SoftBank ($30B), and NVIDIA ($30B) among the headline investors. No other lab is operating at this scale. That matters.

GPT-5.4 is their current flagship, released 6 March 2026. The headline capability additions over GPT-5.2: native computer-use (the first general-purpose OpenAI model with this natively), a 1M token context window, and significantly more token-efficient reasoning. The product surface – DALL-E, Sora, the Realtime API, Codex – is genuinely broad. OpenAI remains the default for teams who haven’t thought carefully about alternatives, and the scale suggests that default has enormous staying power.

The things worth scrutinising: the $730B valuation includes circular participation from investors (Amazon, NVIDIA) who benefit from OpenAI’s growth regardless of whether the valuation reflects actual enterprise value. The surveillance architecture linking OpenAI infrastructure with Persona identity verification, found exposed via Shodan, is a security and governance story that hasn’t been fully told. And the Pentagon deal – which Anthropic declined specifically on the grounds of mass surveillance and autonomous weapons – OpenAI signed. What that entails in practice is not yet fully public.

None of this makes OpenAI irrelevant. At 900 million weekly users, it’s the infrastructure of the AI industry. But the scale and the contracts mean the decisions Altman makes in the next eighteen months have broader consequences than those of any other lab. Follow it with proportional attention and clear eyes.

Where: openai.com, platform.openai.com for developer news, Altman’s posts on X for strategic signals.


Meta AI

Meta’s position in the AI landscape is increasingly defined by what it was rather than what it is doing now.

The Llama open weights releases were genuinely important. Llama democratised access to capable base models and changed how the open source AI ecosystem thought about what was possible outside of proprietary labs. PyTorch remains the dominant training framework. FAIR has produced real research.

But the open weights story has moved on. Chinese labs are now releasing models that match or beat Llama on most benchmarks at competitive or lower cost, with more permissive licences. GLM-5 and Qwen3.5 are better practical choices for most open weights use cases right now. Meta’s response has been slower than you’d expect from a team with their resources.

Worth following for: open weights history, PyTorch development, and the FAIR research pipeline. Less worth following for: frontier model capability.

Where: ai.meta.com, the Llama GitHub repos, FAIR research pages.


Mistral

The European frontier lab. Smaller than the US giants, more focused, and genuinely important for certain use cases.

Mistral’s open weights releases have been consistently good-quality. Le Chat is their consumer product. Their real value proposition, though, is for organisations that care about EU data sovereignty or want frontier-adjacent models they can actually run themselves. GDPR-compliant AI infrastructure is not a niche concern in Europe, and Mistral is currently the best option in that space.

They’re not winning benchmarks at the top of the table, and they’re not pretending to. The positioning is honest: a capable European lab with a commercial model that doesn’t require shipping your data to Virginia.

Where: mistral.ai, their GitHub for model releases.


Zhipu AI

Currently leading the open weights intelligence index with GLM-5, and deserving more attention than most Western engineers are giving it.

GLM-5 is a 744B parameter mixture-of-experts model with 40B active parameters per forward pass. MIT licensed. Priced at $1 per million input tokens and $3.20 per million output tokens. The benchmark numbers are real – this is not a paper model, and the MIT licence means you can actually use it without legal complications.

The lab is Beijing-based, which matters for some use cases and not others. If you’re building for a deployment context where data residency or geopolitical supply chain risk is a concern, factor that in. If you’re evaluating capability per dollar for a standard workload, ignore the geography and look at the numbers.

Where: zhipuai.cn, their HuggingFace model pages.


Alibaba Cloud (Qwen)

For the specific use case of “I want the best open weights model I can run on hardware I already own,” Qwen3.5-35B-A3B is currently the answer.

35B parameters, mixture-of-experts architecture with 3B active parameters. Runs on a 32GB VRAM consumer GPU. Beats GPT-5-mini on standard benchmarks. 1M token context window. Apache 2.0 licence, which is genuinely permissive. This is not a compromise – it is a good model that happens to run locally.

For engineers who want to experiment with agentic workflows without API costs, or need to keep data on-premises, this is the most practically useful model available right now.

Note: as of 5 March 2026, Junyang Lin (Justin), the tech lead who built the Qwen model family, has stepped down along with two other key colleagues. Alibaba has formed a new task force to continue model development. The model is still good – but the team that built it is no longer intact.

Where: huggingface.co/Qwen, the Qwen GitHub repo.


DeepSeek

DeepSeek set the efficiency template that the Chinese open weights labs are now iterating on. Their mixture-of-experts approach and training efficiency innovations changed the conversation about what’s achievable without US-scale compute budgets.

The current story has a geopolitical edge. Reuters reported that DeepSeek is withholding v4 from US chipmakers – a deliberate strategic decision in response to export controls. This is the most explicitly geopolitically charged lab in the space, and the engineering decisions are increasingly inseparable from that context.

Still worth understanding because the efficiency innovations are real and influential. The constraint-driven engineering that came out of working around chip access produced genuinely novel approaches that are now being copied across the industry.

Where: deepseek.com, their research papers on arXiv.


xAI

Grok exists. The models are real but currently losing ground technically against both the US frontier labs and the Chinese open weights challengers.

The more interesting story at xAI right now is organisational. Seven of twelve co-founders have left in three years. The lab has merged with SpaceX, which means access to significant private compute infrastructure – potentially a genuine advantage as data centre costs become a constraint. Whether that compute advantage translates into model improvements depends on talent stability that isn’t currently evident.

Worth watching because the SpaceX compute angle could change the picture quickly. Not worth prioritising attention on right now because the current output doesn’t justify it.

Where: x.ai, Grok at x.com.


Researchers and Practitioners

Labs employ thousands of people. Most of them don’t have a useful public voice. These are the individuals worth following because they produce clear thinking you can actually use, not because they’re prominent.


Andrej Karpathy

The best explainer of complex AI concepts working today. If you want to understand how a thing actually works – not the press release version but the mechanistic version – Karpathy is usually the clearest path to that understanding.

Recent contributions: microgpt, a complete GPT implementation in around 200 lines of code. This is pedagogically important in the way that good textbooks are important – it strips away everything that isn’t essential and leaves you with the thing itself. He coined “vibe coding” as a term for AI-assisted development without full comprehension of the output, and “Claws” as a category name for Claude-based agentic tools. His December inflection observation – that we may have crossed a threshold in late 2025 where AI output routinely passes for competent human work – is the kind of observation that ages well.

Not prolific, but consistent quality. Every video is worth the time.

Where: @karpathy on X, his YouTube channel.


Simon Willison

The most honest practitioner voice in AI. Willison has the rare quality of being genuinely excited about the technology while maintaining rigorous skepticism about claims that aren’t supported by evidence.

His major project right now: a “not quite a book” on Agentic Engineering, publishing chapter by chapter on simonwillison.net. It’s the most systematic treatment of agentic patterns from someone who actually builds things rather than theorises about them. If you care about multi-step AI workflows in practice, this is the most important ongoing piece of writing in the space. Follow it as it publishes.

Earlier contributions that remain worth reading: the Agentic Engineering Patterns guide, his “cognitive debt” concept (the idea that AI assistance can leave you with code or decisions you can’t fully reason about), the “hoarding things you know how to do” piece, and the pelican-on-bicycle test for model personality. His Deep Blue framing, comparing current AI to the chess computer that beat Kasparov, is a useful conceptual anchor for discussions that tend to go either too dystopian or too dismissive.

Where: simonwillison.net – irregular posting schedule, everything is worth reading.


Gergely Orosz (The Pragmatic Engineer)

The best source for what AI is actually doing to engineering organisations. Not the capability story – the management, hiring, productivity, and organisational reality story.

Recent contributions: six predictions for software engineering that have aged well, the Pragmatic Summit, and an interview with Mitchell Hashimoto that is worth reading in full for the “preserving craft” conversation. Orosz writes from the perspective of someone who has worked in large engineering organisations and understands the gap between what technology can do and what organisations actually do with it.

The newsletter is paid and it’s worth the money. The free tier gives you a sense of the quality but the full analysis is behind the subscription.

Where: newsletter.pragmaticengineer.com


swyx (Shawn Wang)

The person best positioned to synthesise what’s happening in AI engineering for practitioners. Swyx has the unusual combination of technical credibility, community connections, and genuine enthusiasm that makes him a reliable first-pass filter for what matters.

Recent contributions: the Latent Space podcast remains one of the best sources for deep technical conversations with practitioners and researchers. The Claude Code Anniversary episode is a good example of the quality – specific, concrete, and informed by actual usage. He’s been actively promoting the “X Engineer” concept (the idea of a new kind of engineer who works primarily through AI systems), which is a framing worth thinking about whether you agree with it or not.

Where: latent.space for the podcast, @swyx on X.


Nolan Lawson

Not prolific. When he writes, it’s worth your time.

His piece “We Mourn Our Craft” got over 710 points on Hacker News, which is a reasonable proxy for resonance. It articulates the loss that skilled engineers feel as AI compresses the gap between novice and expert output – not as a complaint but as a genuine examination of what craft means when the craft becomes partially automated. This is the kind of writing that helps technical leaders have the conversations they need to have with their teams.

Lawson writes about the human side of technical transitions. That’s an underserved perspective in a discourse that tends toward either capability maximalism or pure economic analysis.

Where: nolanlawson.com


Mitchell Hashimoto

HashiCorp founder, now doing hands-on AI agent engineering as an individual practitioner. The value here is the combination: someone who has built serious production infrastructure and is now applying that judgment to AI tooling.

The “preserving craft” conversation on The Pragmatic Engineer podcast is one of the better examinations of what it means to maintain deep technical ability in an environment that increasingly rewards output over understanding. Hashimoto is building things and reporting honestly about what works.

Where: The Pragmatic Engineer podcast, @mitchellh on X.


Max Woolf

A senior data scientist who writes honest empirical accounts of what AI systems actually do in practice, as distinct from what the papers and press releases say they do.

His post converting from “AI Agent Coding Skeptic” to a more positive position is a good example of the quality. He doesn’t change his mind because of marketing – he changes his mind because he ran experiments and the results changed. That’s the right epistemic standard, and it’s rarer than it should be.

Where: minimaxir.com


Ben Thompson

Not an AI researcher. The best business and strategic analyst in tech.

His “SaaSmageddon” framing – the idea that AI agents will displace significant portions of SaaS revenue by performing tasks rather than requiring humans to use software – is one of the more important strategic predictions currently making its way through the industry. Whether you agree with the timeline or magnitude, it’s a framework that senior engineering leaders need to have a view on.

Stratechery is paid. If your role involves product, platform, or organisational strategy and you aren’t reading it, you’re missing a useful analytical lens.

Where: stratechery.com


Nicholas Carlini

Google DeepMind security researcher. Worth following if you care about the adversarial robustness and security side of AI systems, which you should.

His recent experiment using Claude to write a C compiler in parallel – roughly $20,000 in API costs and 100,000 lines of Rust output – is one of the more interesting documented examples of what frontier models can produce on a long-horizon coding task. It’s also a useful data point for thinking about what “AI agent coding” means at scale: impressive, expensive, and raising questions about how you verify what was generated.

Where: nicholas.carlini.com


Newsletters and Podcasts

These are the publications worth having in your reading list. The list is short on purpose.


The Batch (deeplearning.ai)

Andrew Ng’s weekly newsletter. Broad coverage, reliable quality, good for staying current without going deep. Ng has been promoting the “X Engineer” concept – the idea that engineers who can effectively direct AI systems will become the dominant role. Worth reading for his framing even if you don’t fully agree with the conclusion.

Where: deeplearning.ai/the-batch


TLDR AI

Daily digest. Good signal-to-noise ratio for a daily publication, which is harder to achieve than it sounds. Good for catching announcements you might have missed. Not a substitute for deeper reading, but useful for not falling behind on headlines.

Where: tldr.tech/ai


Latent Space (swyx + Alessio)

The best podcast for deep technical conversations about AI engineering. The guests are well-chosen, the questions are informed, and the episodes don’t pad for time. Listen at 1.5x if you’re in a hurry.

Where: latent.space


Changelog

Software development broadly, with increasing AI coverage. Good for maintaining perspective on AI as one part of a larger engineering practice rather than the whole practice. The Changelog hosts bring genuine engineering background to their interviews, which shows.

Where: changelog.com


Alpha Signal

An aggregator. Useful for catching research papers and model releases that don’t get mainstream coverage. The name sounds like a trading firm but it’s a legitimate AI news digest.

Where: alphasignal.ai


Simon Willison’s Weblog

Already listed in researchers above, but worth listing again here. Irregular posting, consistently high quality, and currently the home of the Agentic Engineering book-in-progress. Set up an RSS feed and read everything that comes through.

Where: simonwillison.net


The Pragmatic Engineer

Already covered above. Paid. Worth it specifically for engineering leadership – the intersection of AI capability and organisational reality is where Orosz is strongest.

Where: newsletter.pragmaticengineer.com


Hacker News

Yes, it’s obvious. It’s still the best real-time signal for how practitioners are actually responding to new releases and announcements. The front page is hit-and-miss but the comments on AI-related submissions are often where the most grounded analysis appears. Search for specific topics rather than relying on the front page.

Where: news.ycombinator.com


Ars Technica

Good technical journalism with real depth when they get it right. Worth flagging: in early 2026, Ars fired senior AI reporter Benj Edwards after AI-fabricated quotes were published under his byline. The details are still unfolding but it’s a credibility hit for their AI coverage specifically. Their broader tech reporting remains solid – treat their AI journalism with more scrutiny than you might have previously.

Where: arstechnica.com


Labs:

People:

Publications:


This post reflects the state of the field as of 6 March 2026. The landscape moves fast. Check the changelog at the top for updates. If something here is wrong or out of date, the correction mechanism is in the changelog, not a correction buried in a later post.


Commissioned, Curated and Published by Russ. Researched and written with AI. You are reading the latest version of this post. View all snapshots.