Commissioned, Curated and Published by Russ. Researched and written with AI. This is the living version of this post. View versioned snapshots in the changelog below.
Opinions are mine. No affiliate links, no sponsored entries, no one asked to be included.
This is a living document. Labs rise and fall. Researchers go quiet or pivot. I’ll update it when the landscape meaningfully shifts. Check the changelog at the bottom for what’s changed since you last read it.
What’s New This Week
4 April 2026.
Microsoft has launched three original in-house AI models: MAI-Transcribe-1 (a speech-to-text model claiming the lowest average word error rate across 25 languages, beating OpenAI Whisper-large-v3 across all 25 and Google Gemini 3.1 Flash on 22 of 25), MAI-Voice-1 (voice generation including voice cloning from seconds of audio at 60x real-time speed), and MAI-Image-2 (image generation). All three are available through Microsoft Foundry and a new MAI Playground.
Mustafa Suleyman, who formed Microsoft’s superintelligence team six months ago with an explicit goal of AI self-sufficiency, told VentureBeat ahead of the announcement: ‘We’re now a top three lab just under OpenAI and Gemini.’ The models were built in-house, not on top of OpenAI’s API, and are priced aggressively partly to reduce Microsoft’s own cost of goods sold.
This post has not previously covered Microsoft as a lab in its own right, treating it primarily as an OpenAI investor and distribution partner. That framing is now materially outdated. A lab with Suleyman (the person who built DeepMind) at the helm has shipped benchmark-leading original models and is explicitly claiming frontier status. The Microsoft AI (MAI) section has been added below and the OpenAI section updated to reflect the changed relationship.
Changelog
| Date | Summary |
|---|---|
| 4 Apr 2026 | 4 Apr 2026 – Microsoft ships its first original in-house AI models (MAI-Transcribe-1, MAI-Voice-1, MAI-Image-2); Suleyman claims ’top three lab’ status, challenging the post’s framing of Microsoft as distribution partner rather than frontier lab. |
| 1 Apr 2026 | 1 Apr 2026 – OpenAI reportedly preparing ‘Spud’ model release alongside policy push to ‘rethink the social contract’; Altman tells staff ’things are moving faster than many of us expected.’ |
| 1 Apr 2026 | 1 Apr 2026 – OpenAI closes $122B round at $852B valuation (up from $730B); Claude Code source leak reveals Undercover Mode instructing the tool to hide AI authorship in public open-source contributions. |
| 29 Mar 2026 | 29 Mar 2026 – xAI co-founder exodus now near-total (all but two of eleven departed); Gemini 3 Deep Think live for Ultra subscribers with early API access for engineers. |
| 28 Mar 2026 | 28 Mar 2026 – NeurIPS announces then reverses restrictions on Chinese participants after boycott threat; Stanford and Princeton find Chinese models more likely to dodge political questions, sharpening the geopolitical picture for DeepSeek and Chinese open weights labs. |
| 27 Mar 2026 | 27 Mar 2026 – ARC-AGI-3 day-1 results in: frontier LLMs score under 1% solo, but Symbolica’s agentic SDK hits 36% using the same models, sharpening the CoT vs agency gap that underpins LeCun’s AMI Labs thesis. |
| 26 Mar 2026 | 26 Mar 2026 – ARC-AGI-3 launches, AI scores 12.58% vs human 100% on agentic tasks, directly supporting LeCun’s AMI Labs thesis; Cursor’s flagship model revealed as built on Chinese open-source Kimi K2.5 from Moonshot AI. |
| 25 Mar 2026 | 25 Mar 2026 – OpenAI shuts down Sora app with no explanation, Disney drops $1B investment; Microsoft poaches Ai2 CEO and OLMo lead researcher for Suleyman’s Superintelligence team. |
| 24 Mar 2026 | 24 Mar 2026 – Epoch confirms GPT-5.4 Pro solved a frontier math open problem; Dreamer AI joins Meta Superintelligence Labs with Hugo Barra returning to Meta. |
| 23 Mar 2026 | 23 Mar 2026 – Quiet day, thesis holds. |
| 22 Mar 2026 | 22 Mar 2026 – Quiet day, thesis holds. |
| 21 Mar 2026 | 21 Mar 2026 – Quiet day, thesis holds. |
| 20 Mar 2026 | 20 Mar 2026 – FSF threatens Anthropic over LLM copyright, referencing a settlement; adds a third legal front to Anthropic’s 2026 story alongside the RSP rollback and the Pentagon case. |
| 19 Mar 2026 | 19 Mar 2026 – Quiet day, thesis holds. |
| 18 Mar 2026 | 18 Mar 2026 – Mistral launches Forge, an enterprise custom model training platform with major launch partners including ASML and the European Space Agency. |
| 16 Mar 2026 | 16 Mar 2026 – Quiet day, thesis holds. |
| 15 Mar 2026 | 15 Mar 2026 – Quiet day, thesis holds. |
| 14 Mar 2026 | Anthropic 1M context now GA for Opus 4.6 and Sonnet 4.6; Meta planning 15,000+ layoffs with AI costs cited as a driver. |
| 13 Mar 2026 | Microsoft and rival AI researchers back Anthropic in Pentagon legal battle; xAI poaches two senior Cursor AI leaders. |
| 12 Mar 2026 | Quiet day, thesis holds. |
| 11 Mar 2026 | Quiet day, thesis holds. |
| 10 Mar 2026 | LeCun’s AMI Labs raises $1B+ at $3.5B valuation; Meta AI section updated, AMI Labs added as emerging lab to watch. |
| 9 Mar 2026 | Quiet day, thesis holds. |
| 8 Mar 2026 | Karpathy publishes autoresearch: AI agents doing autonomous nanochat training research on a single GPU. |
| 7 Mar 2026 | A quieter day – nothing today that shifts the thesis. |
| 6 Mar 2026 | GPT-5.4 released (computer-use, 1M tokens). Anthropic DoW formal letter – court challenge filed. |
| 5 Mar 2026 | Alibaba Qwen leadership exodus – Justin Lin resigns, task force formed. |
| 4 Mar 2026 | Meta/LeCun multimodal pretraining paper (arXiv:2603.03276). |
| 3 Mar 2026 | OpenAI scale numbers updated; Anthropic/OpenAI Pentagon divergence; Willison agentic engineering book |
| 2 Mar 2026 | Initial publication |
Labs
This section covers the organisations actually pushing capability or shaping how the industry thinks. Not every well-funded lab is worth your attention. Some have great PR and mediocre models. The ones listed below have either done something technically significant recently, made an interesting strategic move, or taken a position worth understanding.
Anthropic
The most interesting safety/capability tension in the industry, and now the lab with the clearest set of principles in practice – not just on paper.
Claude Opus 4.6 and Sonnet 4.6 remain strong models for reasoning-heavy work. As of 14 March 2026, Anthropic has made its 1M token context window generally available for both models – bringing them to parity with OpenAI’s context offering and making long-context retrieval and document processing viable for production workloads. Claude Code has become a serious developer tool – more than a million developers are using agentic coding tools and Claude is a significant chunk of that. Cowork extends this toward multi-agent collaboration.
The honest version of Anthropic’s story has always been complicated. They quietly dropped their Responsible Scaling Policy commitments under competitive pressure – reported by TIME, not announced by Anthropic. That still matters. But a second data point has now landed: the US Department of War designated Anthropic a “supply chain risk” after the lab refused to enable mass domestic surveillance and autonomous weapons systems. They walked away from a contract rather than build that. Most labs would have taken the money.
OpenAI signed the Pentagon deal instead. That’s a meaningful fork in the road between two labs that are often compared as though they’re interchangeable. They aren’t, and this is the clearest evidence yet of why.
The legal situation has widened: Microsoft and rival AI researchers have now filed in support of Anthropic in its escalating case against the US Department of War. Their brief argues the government compelling private actors to change their speech is one of the greater dangers to free expression. The fact that competitors are rallying behind Anthropic here – not because they like Anthropic, but because the principle matters – adds weight to the read that this is a genuine inflection point for the industry, not just one lab’s legal problem.
A third legal front opened in March 2026: the Free Software Foundation has threatened Anthropic over alleged copyright infringement in LLM training data, calling for the models to be released under open source terms. The FSF’s licensing blog references a settlement, suggesting some resolution may be underway. This sits alongside the RSP rollback and the Department of War case as the third distinct governance or legal challenge Anthropic has faced in quick succession. For teams evaluating Anthropic as a long-term platform partner, the IP dimension is now live and worth watching.
A fourth story landed on 31 March 2026: the Claude Code source code leaked publicly. The most discussed disclosure is Undercover Mode – a ~90-line file (undercover.ts) that strips all Anthropic branding when Claude Code operates in non-internal repositories, and explicitly instructs the model never to attribute its work to Claude Code or identify itself as AI in commit messages or PR descriptions. This surfaces a meaningful tension with the transparency framing that has made Anthropic stand out. The post has consistently held that Anthropic is the lab whose principles are tested rather than stated – this is a concrete piece of counter-evidence. The story was #2 on Hacker News with over 900 points within hours of publication.
Watch Anthropic because the models are genuinely good, the principles are tested rather than stated, and the safety/commercial tension is the realest version of that tension in the industry. They’re not perfect – the RSP rollback is a genuine mark against them, and the Undercover Mode disclosure adds a second – but they’re more interesting than the alternatives for anyone who cares about where this is going.
Where: anthropic.com, their research blog, and the Constitutional AI and RSP documents are worth reading in full if you haven’t.
Google DeepMind
The lab that actually has the broadest portfolio and is currently winning on the benchmark that matters most to most engineers: price-adjusted performance.
Gemini 3.1 Pro sits at 57 points on the Intelligence Index, above Claude Opus 4.6, and costs roughly $892 per million tokens less than its nearest equivalent. If you are choosing a frontier model for production workloads and you haven’t re-evaluated in the last month, you’re probably overpaying.
Beyond the flagship model, DeepMind’s portfolio is genuinely wide. AlphaFold continues to compound – the structural biology community has essentially rebuilt around it. Robotics and quantum research are real programs, not PR. Lyria 3 is their music generation model, which matters if your use case touches audio. WebMCP is a Chrome standard they’re pushing for model-context-protocol browser integration, which could reshape how AI tools interact with the web.
Gemini 3 Deep Think is now live in the Gemini app for Ultra subscribers, with early API access opening for engineers, researchers, and enterprises. Google is positioning it explicitly for harder technical use cases – scientific and engineering work – rather than general-purpose chat. This is a distinct product from Gemini 3.1 Pro (the benchmark flagship) and signals Google continuing its strategy of shipping specialised variants rather than a single monolithic model.
The honest caveat: Google’s product integration track record is patchy. Good research doesn’t always become good products at Google. But the research output right now is strong enough that DeepMind earns a close watch regardless.
Where: deepmind.google, Google AI blog, their arXiv preprints.
OpenAI
The highest-profile lab, the most widely deployed, and as of March 2026, unambiguously the largest: 900 million weekly active users, 50 million consumer subscribers, 9 million paying businesses, 1.6 million weekly Codex developers (up 3x since January). OpenAI has now closed a $122 billion funding round at a post-money valuation of $852 billion, confirmed by Bloomberg, CNBC, and OpenAI directly. Total capital raised across rounds now exceeds $230 billion. Amazon, SoftBank, and NVIDIA remain among the headline investors. No other lab is operating at this scale. That matters.
GPT-5.4 is their current flagship, released 6 March 2026. The headline capability additions over GPT-5.2: native computer-use (the first general-purpose OpenAI model with this natively), a 1M token context window, and significantly more token-efficient reasoning. The product footprint has contracted: DALL-E, the Realtime API, and Codex remain, but Sora – OpenAI’s generative video app, launched to substantial fanfare and previously cited here as evidence of a genuinely broad product surface – was shut down on 25 March 2026 with no public explanation. OpenAI’s only stated position is that video generation will continue to be used internally to train robots. Disney separately dropped plans for a reported $1 billion investment connected to the Sora partnership on the same day. OpenAI remains the default for teams who haven’t thought carefully about alternatives, and the scale suggests that default has enormous staying power – but the Sora shutdown is a signal worth noting about how the product strategy is evolving.
On 24 March 2026, Epoch AI independently confirmed that GPT-5.4 Pro solved a frontier mathematics open problem. This is a meaningful external validation: Epoch is an independent research organisation that tracks AI capability milestones, and a confirmed proof of a previously unsolved problem is a different class of result from benchmark improvements. It adds a concrete data point to the case that the reasoning gains in GPT-5.4 are real.
One further signal to watch: a Vanity Fair exclusive (31 March 2026) reports OpenAI is preparing to release a new model internally code-named Spud, with a planned policy push framed as ‘rethinking the social contract.’ At a company-wide meeting Altman told staff ’things are moving faster than many of us expected.’ The Information reported the Spud codename separately. No release date is confirmed, but combined with the $122B funding close the signal is that OpenAI is staging a major public moment in early April. Watch for GPT-5.4 to be superseded.
Note: the Microsoft/OpenAI relationship is now more complicated than it was. Microsoft has launched original in-house models (MAI series) under Mustafa Suleyman’s superintelligence team, explicitly positioning itself as a top-three lab and reducing dependency on OpenAI’s API. The two organisations remain deeply intertwined – Microsoft is OpenAI’s largest outside investor and primary cloud partner – but Suleyman is now openly competing with OpenAI on model quality, not just reselling its output.
The things worth scrutinising: the $852B valuation includes circular participation from investors (Amazon, NVIDIA) who benefit from OpenAI’s growth regardless of whether the valuation reflects actual enterprise value. The surveillance architecture linking OpenAI infrastructure with Persona identity verification, found exposed via Shodan, is a security and governance story that hasn’t been fully told. And the Pentagon deal – which Anthropic declined specifically on the grounds of mass surveillance and autonomous weapons – OpenAI signed. What that entails in practice is not yet fully public.
None of this makes OpenAI irrelevant. At 900 million weekly users, it’s the infrastructure of the AI industry. But the scale and the contracts mean the decisions Altman makes in the next eighteen months have broader consequences than those of any other lab. Follow it with proportional attention and clear eyes.
Where: openai.com, platform.openai.com for developer news, Altman’s posts on X for strategic signals.
Meta AI
Meta’s position in the AI landscape is increasingly defined by what it was rather than what it is doing now – and that picture sharpened significantly in early 2026 when Yann LeCun officially departed as Meta’s chief AI scientist.
LeCun is one of the three Turing Award recipients whose research underlies modern deep learning. His departure is not a footnote. He has now founded Advanced Machine Intelligence Labs (AMI Labs) with other ex-Meta researchers, and as of 10 March 2026, AMI Labs has raised over $1 billion in seed funding at a $3.5 billion valuation with only 12 employees. LeCun’s stated thesis – the one he is now betting his post-Meta career on – is that LLMs are not a path to truly intelligent machines because they cannot plan ahead and lack real-world grounding. He has made this argument publicly for years; he is now funding the alternative.
The specific context for LeCun’s departure matters. Meta’s $14.3 billion acquisition of Scale AI brought Alexandr Wang – Scale AI’s founder and former CEO – into Meta as Chief AI Officer, leading the newly formed Meta Superintelligence Labs. Under the new organisational structure, LeCun reported to Wang, a 26-year-old whose company’s business was data labelling, placed above one of the three Turing Award recipients who defined modern deep learning. That is a concrete and specific explanation for why LeCun left to found AMI Labs that was not previously public.
The Llama open weights releases were genuinely important and remain so. Llama democratised access to capable base models and changed how the open source AI ecosystem thought about what was possible outside of proprietary labs. PyTorch remains the dominant training framework. FAIR has produced real research.
But the open weights story has moved on. Chinese labs are now releasing models that match or beat Llama on most benchmarks at competitive or lower cost, with more permissive licences. GLM-5 and Qwen3.5-35B-A3B are better practical choices for most open weights use cases right now. Meta’s response has been slower than you’d expect from a team with their resources. The departure of one of the founding researchers of modern deep learning does not help that picture.
As of 14 March 2026, Reuters is reporting that Meta is planning its largest-ever round of layoffs – 15,000 or more positions – with rising AI infrastructure costs cited as a primary driver. This deepens the existing picture of strategic strain: the lab is spending heavily on compute while its research output lags Chinese open weights competitors and its most prominent scientist has departed to found a rival lab.
As of 24 March 2026, AI startup Dreamer is joining what is being called Meta Superintelligence Labs, bringing its entire team and co-founder Hugo Barra (a former Meta VP) back into Meta’s AI organisation. The name “Meta Superintelligence Labs” signals either a new internal division or a rebranding of existing research efforts toward a more explicit AGI-oriented goal. The contrast is worth noting: Meta is forming a superintelligence unit at the same moment LeCun, its former chief AI scientist, is separately funding an alternative thesis at AMI Labs on the grounds that current approaches cannot reach true intelligence.
As of 25 March 2026, the predicted layoffs have begun. The New York Times and TechCrunch confirmed 700 employees have been cut across Reality Labs, sales, and recruiting. This is materially smaller than the 15,000-plus figure from the earlier Reuters report, which Meta described as speculative. Whether further cuts follow is unconfirmed. The strategic picture – heavy AI infrastructure investment, lagging model output relative to Chinese open weights competitors, and LeCun’s departure – remains unchanged.
Worth following for: open weights history, PyTorch development, and the FAIR research pipeline. Less worth following for: frontier model capability.
Where: ai.meta.com, the Llama GitHub repos, FAIR research pages.
AMI Labs
One month old. Twelve employees. $3.5 billion valuation. $1 billion raised. No model, no paper, no product yet.
Founded by Yann LeCun and ex-Meta researchers on a single founding thesis: that LLMs will not produce truly intelligent machines. LeCun’s argument is that intelligence requires the ability to plan ahead and to build models of the world from embodied, real-world experience – neither of which digital text training delivers. Whether that argument is correct is one of the most genuinely contested questions in AI right now.
Worth watching for three reasons. First, LeCun is one of the most credentialed researchers in the field – this is not a VC-backed positioning exercise, it’s a serious researcher putting his post-Meta career on the line against the dominant paradigm. Second, the anti-LLM thesis is a real intellectual bet, not a rebrand. Third, the scale of investor conviction – $3.5 billion pre-product, backed by Jeff Bezos and Mark Cuban – is itself a data point about where frontier AI money is flowing and how the investor class is thinking about the limits of the current approach.
A concrete external data point has now landed in favour of LeCun’s thesis. On 26 March 2026, the ARC Prize Foundation (co-founded by Chollet and Zapier co-founder Mike Knoop, who together also founded Ndea) released ARC-AGI-3, the third version of its AGI benchmark and the first explicitly designed to test agentic intelligence. Unlike prior ARC-AGI versions which tested pattern recognition on static grids, ARC-AGI-3 requires agents to explore novel environments with no stated rules or goals, build world models from experience, and plan over long horizons with sparse feedback. Humans score 100%. The best AI in the preview phase scored 12.58%. The specific design principle – that intelligence requires learning from experience rather than pattern-matching on training data – is precisely the argument LeCun has made against LLMs for years and is now funding at AMI Labs. This does not prove LeCun right about the long-term path, but it provides the clearest measurable illustration yet of the gap his thesis is trying to address.
The ARC-AGI-3 day-1 results (27 March 2026) update and extend the data point introduced at launch. Frontier LLMs in pure chain-of-thought mode scored under 1%: Gemini 3.1 Pro 0.37%, GPT-5.4 0.26%, Claude Opus 4.6 0.25%, Grok 4.20 effectively 0%. Symbolica’s agentic SDK reached 36.08% on day 1 by wrapping the same models in an agentic loop. The 12.58% figure from the preview phase referred to the best agentic score at that stage – the live competition is now showing agents clustering around 36% while pure LLMs remain below 1% and humans score 100%. This split is precisely the gap LeCun’s founding thesis names: LLMs alone cannot plan ahead and adapt from experience, but adding an agentic architecture significantly closes the gap even if it does not eliminate it.
Too early to evaluate on output. Absolute first entry on the watchlist.
Microsoft AI (MAI)
Not previously covered here because Microsoft’s AI story has been primarily about distribution: reselling OpenAI models through Azure, embedding Copilot into Office, acting as the largest outside investor in OpenAI. That framing is now materially out of date.
On 3 April 2026, Microsoft launched three original in-house AI models: MAI-Transcribe-1 (speech-to-text, claiming lowest average word error rate on the FLEURS benchmark across the top 25 languages – beating OpenAI Whisper-large-v3 across all 25 and Google Gemini 3.1 Flash on 22 of 25), MAI-Voice-1 (voice generation including voice cloning from seconds of audio at 60x real-time), and MAI-Image-2 (image generation). All are available through Microsoft Foundry and a new MAI Playground.
Mustafa Suleyman, who leads Microsoft’s AI efforts, told VentureBeat ahead of the launch: ‘We’re now a top three lab just under OpenAI and Gemini.’ Suleyman is not a new name here – he founded DeepMind, sold it to Google, led Google AI, and joined Microsoft in 2023. He formed the internal superintelligence team six months ago with an explicit goal of AI self-sufficiency, reducing Microsoft’s dependency on OpenAI. These are the first models that team has shipped publicly.
The business context matters: Microsoft’s stock just closed its worst quarter since 2008 as investors question whether its AI infrastructure spend will translate into revenue. These models are partly Suleyman’s answer to that pressure, priced aggressively and designed to reduce cost of goods sold.
This moves Microsoft from ‘investor and distribution channel for OpenAI’ to ‘competing lab with original benchmark-leading models and a founder-class researcher at the helm.’ That is a meaningful shift. Worth adding to the watchlist.
Where: azure.microsoft.com/en-us/products/ai-foundry, the MAI Playground.
Mistral
The European frontier lab. Smaller than the US giants, more focused, and genuinely important for certain use cases.
Mistral’s open weights releases have been consistently good-quality. Le Chat is their consumer product. Their real value proposition, though, is for organisations that care about EU data sovereignty or want frontier-adjacent models they can actually run themselves. GDPR-compliant AI infrastructure is not a niche concern in Europe, and Mistral is currently the best option in that space.
They’re not winning benchmarks at the top of the table, and they’re not pretending to. The positioning is honest: a capable European lab with a commercial model that doesn’t require shipping your data to Virginia.
On 18 March 2026, Mistral launched Forge, an enterprise platform for training frontier-grade AI models on proprietary organisational data. Forge supports pre-training, post-training, and reinforcement learning, and is already deployed with ASML, the European Space Agency, Ericsson, and DSO National Laboratories Singapore among others. This meaningfully expands Mistral’s value proposition: alongside open weights and EU data residency, they now offer a path to deeply customised models that understand internal workflows, codebases, and institutional knowledge. For European enterprises evaluating AI strategy, this makes Mistral a more complete option than it was a week ago.
Where: mistral.ai, their GitHub for model releases.
Zhipu AI
Currently leading the open weights intelligence index with GLM-5, and deserving more attention than most Western engineers are giving it.
GLM-5 is a 744B parameter mixture-of-experts model with 40B active parameters per forward pass. MIT licensed. Priced at $1 per million input tokens and $3.20 per million output tokens. The benchmark numbers are real – this is not a paper model, and the MIT licence means you can actually use it without legal complications.
The lab is Beijing-based, which matters for some use cases and not others. If you’re building for a deployment context where data residency or geopolitical supply chain risk is a concern, factor that in. If you’re evaluating capability per dollar for a standard workload, ignore the geography and look at the numbers.
The Cursor/Moonshot AI story this week adds a sharp practical data point to the Chinese open weights picture. Cursor, the AI coding tool used by over a million developers daily, launched a model called Composer 2 on 22 March 2026, promoted as delivering frontier-level coding intelligence. Within hours, developers identified the model as built on Kimi K2.5, an open-source model from Moonshot AI, a Beijing-based lab not currently covered in this post. Cursor subsequently confirmed this. The significance is not the transparency failure – it is the underlying commercial reality: a well-funded US coding startup with a $2.5 billion valuation chose a Chinese open-source model as the foundation for its flagship over any US proprietary or open alternative. Moonshot AI’s Kimi K2.5 is now on the watchlist.
Where: zhipuai.cn, their HuggingFace model pages.
Alibaba Cloud (Qwen)
For the specific use case of “I want the best open weights model I can run on hardware I already own,” Qwen3.5-35B-A3B is currently the answer.
35B parameters, mixture-of-experts architecture with 3B active parameters. Runs on a 32GB VRAM consumer GPU. Beats GPT-5-mini on standard benchmarks. 1M token context window. Apache 2.0 licence, which is genuinely permissive. This is not a compromise – it is a good model that happens to run locally.
For engineers who want to experiment with agentic workflows without API costs, or need to keep data on-premises, this is the most practically useful model available right now.
Note: as of 5 March 2026, Junyang Lin (Justin), the tech lead who built the Qwen model family, has stepped down along with two other key colleagues. Alibaba has formed a new task force to continue model development. The model is still good – but the team that built it is no longer intact.
Where: huggingface.co/Qwen, the Qwen GitHub repo.
DeepSeek
DeepSeek set the efficiency template that the Chinese open weights labs are now iterating on. Their mixture-of-experts approach and training efficiency innovations changed the conversation about what’s achievable without US-scale compute budgets.
The current story has a geopolitical edge. Reuters reported that DeepSeek is withholding v4 from US chipmakers – a deliberate strategic decision in response to export controls. This is the most explicitly geopolitically charged lab in the space, and the engineering decisions are increasingly inseparable from that context.
The geopolitical dimension sharpened further this week. NeurIPS, the world’s leading AI research conference, announced new restrictions barring participation from organizations on US sanctions and entity lists. Chinese researchers threatened a mass boycott; NeurIPS reversed the policy within days. Experts are calling it a potential watershed moment for US-China scientific decoupling in AI specifically. Separately, Stanford and Princeton researchers published findings that Chinese AI models are measurably more likely than Western counterparts to dodge political questions or produce inaccurate answers on politically sensitive topics. If you are evaluating Chinese open weights models for any deployment that touches politically sensitive content, this is a concrete and specific risk to factor in.
Still worth understanding because the efficiency innovations are real and influential. The constraint-driven engineering that came out of working around chip access produced genuinely novel approaches that are now being copied across the industry.
Where: deepseek.com, their research papers on arXiv.
xAI
Grok exists. The models are real but currently losing ground technically against both the US frontier labs and the Chinese open weights challengers.
The organisational story at xAI has now reached a different order of magnitude. As of late March 2026, reporting confirms all but two of the original eleven xAI co-founders have departed. The figure previously cited here – seven of twelve leaving – has been materially overtaken. This is a near-complete collapse of the founding team and the clearest signal yet of the organisational instability that has been building since the lab launched.
The lab has merged with SpaceX, which means access to significant private compute infrastructure – potentially a genuine advantage as data centre costs become a constraint. Whether that compute advantage translates into model improvements depends on talent stability that isn’t currently evident.
xAI has poached two senior leaders from Cursor AI – Andrew Milich and Jason Ginsberg – in a targeted recruitment move. Cursor has been one of the standout developer tools of the past year, so pulling senior talent from there signals xAI is continuing to build aggressively on the applied/tooling side, not just the model side.
Worth watching because the SpaceX compute angle could change the picture quickly. Not worth prioritising attention on right now because the current output doesn’t justify it, and the founding team attrition adds a material question mark over the lab’s direction.
Where: x.ai, Grok at x.com.
Researchers and Practitioners
Labs employ thousands of people. Most of them don’t have a useful public voice. These are the individuals worth following because they produce clear thinking you can actually use, not because they’re prominent.
Andrej Karpathy
The best explainer of complex AI concepts working today. If you want to understand how a thing actually works – not the press release version but the mechanistic version – Karpathy is usually the clearest path to that understanding.
Recent contributions: microgpt, a complete GPT implementation in around 200 lines of code. This is pedagogically important in the way that good textbooks are important – it strips away everything that isn’t essential and leaves you with the thing itself. He coined “vibe coding” as a term for AI-assisted development without full comprehension of the output, and “Claws” as a category name for Claude-based agentic tools. His December inflection observation – that we may have crossed a threshold in late 2025 where AI output routinely passes for competent human work – is the kind of observation that ages well.
His latest project is autoresearch (github.com/karpathy/autoresearch) – an agentic system that runs nanochat model training experiments autonomously on a single consumer GPU, closing the loop between hypothesis, training run, and result without human intervention at each step. It’s early but it’s a concrete example of the kind of recursive self-improvement research that most labs are still doing at scale. Karpathy is doing it on a laptop.
Not prolific, but consistent quality. Every video is worth the time.
Where: @karpathy on X, his YouTube channel.
Simon Willison
The most honest practitioner voice in AI. Willison has the rare quality of being genuinely excited about the technology while maintaining rigorous skepticism about claims that aren’t supported by evidence.
His major project right now: a “not quite a book” on Agentic Engineering, publishing chapter by chapter on simonwillison.net. It’s the most systematic treatment of agentic patterns from someone who actually builds things rather than theorises about them. If you care about multi-step AI workflows in practice, this is the most important ongoing piece of writing in the space. Follow it as it publishes.
Earlier contributions that remain worth reading: the Agentic Engineering Patterns guide, his “cognitive debt” concept (the idea that AI assistance can leave you with code or decisions you can’t fully reason about), the “hoarding things you know how to do” piece, and the pelican-on-bicycle test for model personality. His Deep Blue framing, comparing current AI to the chess computer that beat Kasparov, is a useful conceptual anchor for discussions that tend to go either too dystopian or too dismissive.
Where: simonwillison.net – irregular posting schedule, everything is worth reading.
Gergely Orosz (The Pragmatic Engineer)
The best source for what AI is actually doing to engineering organisations. Not the capability story – the management, hiring, productivity, and organisational reality story.
Recent contributions: six predictions for software engineering that have aged well, the Pragmatic Summit, and an interview with Mitchell Hashimoto that is worth reading in full for the “preserving craft” conversation. Orosz writes from the perspective of someone who has worked in large engineering organisations and understands the gap between what technology can do and what organisations actually do with it.
The newsletter is paid and it’s worth the money. The free tier gives you a sense of the quality but the full analysis is behind the subscription.
Where: newsletter.pragmaticengineer.com
swyx (Shawn Wang)
The person best positioned to synthesise what’s happening in AI engineering for practitioners. Swyx has the unusual combination of technical credibility, community connections, and genuine enthusiasm that makes him a reliable first-pass filter for what matters.
Recent contributions: the Latent Space podcast remains one of the best sources for deep technical conversations with practitioners and researchers. The Claude Code Anniversary episode is a good example of the quality – specific, concrete, and informed by actual usage. He’s been actively promoting the “X Engineer” concept (the idea of a new kind of engineer who works primarily through AI systems), which is a framing worth thinking about whether you agree with it or not.
Where: latent.space for the podcast, @swyx on X.
Nolan Lawson
Not prolific. When he writes, it’s worth your time.
His piece “We Mourn Our Craft” got over 710 points on Hacker News, which is a reasonable proxy for resonance. It articulates the loss that skilled engineers feel as AI compresses the gap between novice and expert output – not as a complaint but as a genuine examination of what craft means when the craft becomes partially automated. This is the kind of writing that helps technical leaders have the conversations they need to have with their teams.
Lawson writes about the human side of technical transitions. That’s an underserved perspective in a discourse that tends toward either capability maximalism or pure economic analysis.
Where: nolanlawson.com
Mitchell Hashimoto
HashiCorp founder, now doing hands-on AI agent engineering as an individual practitioner. The value here is the combination: someone who has built serious production infrastructure and is now applying that judgment to AI tooling.
The “preserving craft” conversation on The Pragmatic Engineer podcast is one of the better examinations of what it means to maintain deep technical ability in an environment that increasingly rewards output over understanding. Hashimoto is building things and reporting honestly about what works.
Where: The Pragmatic Engineer podcast, @mitchellh on X.
Max Woolf
A senior data scientist who writes honest empirical accounts of what AI systems actually do in practice, as distinct from what the papers and press releases say they do.
His post converting from “AI Agent Coding Skeptic” to a more positive position is a good example of the quality. He doesn’t change his mind because of marketing – he changes his mind because he ran experiments and the results changed. That’s the right epistemic standard, and it’s rarer than it should be.
Where: minimaxir.com
Ben Thompson
Not an AI researcher. The best business and strategic analyst in tech.
His “SaaSmageddon” framing – the idea that AI agents will displace significant portions of SaaS revenue by performing tasks rather than requiring humans to use software – is one of the more important strategic predictions currently making its way through the industry. Whether you agree with the timeline or magnitude, it’s a framework that senior engineering leaders need to have a view on.
Stratechery is paid. If your role involves product, platform, or organisational strategy and you aren’t reading it, you’re missing a useful analytical lens.
Where: stratechery.com
Nicholas Carlini
Google DeepMind security researcher. Worth following if you care about the adversarial robustness and security side of AI systems, which you should.
His recent experiment using Claude to write a C compiler in parallel – roughly $20,000 in API costs and 100,000 lines of Rust output – is one of the more interesting documented examples of what frontier models can produce on a long-horizon coding task. It’s also a useful data point for thinking about what “AI agent coding” means at scale: impressive, expensive, and raising questions about how you verify what was generated.
Where: nicholas.carlini.com
Newsletters and Podcasts
These are the publications worth having in your reading list. The list is short on purpose.
The Batch (deeplearning.ai)
Andrew Ng’s weekly newsletter. Broad coverage, reliable quality, good for staying current without going deep. Ng has been promoting the “X Engineer” concept – the idea that engineers who can effectively direct AI systems will become the dominant role. Worth reading for his framing even if you don’t fully agree with the conclusion.
Where: deeplearning.ai/the-batch
TLDR AI
Daily digest. Good signal-to-noise ratio for a daily publication, which is harder to achieve than it sounds. Good for catching announcements you might have missed. Not a substitute for deeper reading, but useful for not falling behind on headlines.
Where: tldr.tech/ai
Latent Space (swyx + Alessio)
The best podcast for deep technical conversations about AI engineering. The guests are well-chosen, the questions are informed, and the episodes don’t pad for time. Listen at 1.5x if you’re in a hurry.
Where: latent.space
Changelog
Software development broadly, with increasing AI coverage. Good for maintaining perspective on AI as one part of a larger engineering practice rather than the whole practice. The Changelog hosts bring genuine engineering background to their interviews, which shows.
Where: changelog.com
Alpha Signal
An aggregator. Useful for catching research papers and model releases that don’t get mainstream coverage. The name sounds like a trading firm but it’s a legitimate AI news digest.
Where: alphasignal.ai
Simon Willison’s Weblog
Already listed in researchers above, but worth listing again here. Irregular posting, consistently high quality, and currently the home of the Agentic Engineering book-in-progress. Set up an RSS feed and read everything that comes through.
Where: simonwillison.net
The Pragmatic Engineer
Already covered above. Paid. Worth it specifically for engineering leadership – the intersection of AI capability and organisational reality is where Orosz is strongest.
Where: newsletter.pragmaticengineer.com
Hacker News
Yes, it’s obvious. It’s still the best real-time signal for how practitioners are actually responding to new releases and announcements. The front page is hit-and-miss but the comments on AI-related submissions are often where the most grounded analysis appears. Search for specific topics rather than relying on the front page.
Where: news.ycombinator.com
Ars Technica
Good technical journalism with real depth when they get it right. Worth flagging: in early 2026, Ars fired senior AI reporter Benj Edwards after AI-fabricated quotes were published under his byline. The details are still unfolding but it’s a credibility hit for their AI coverage specifically. Their broader tech reporting remains solid – treat their AI journalism with more scrutiny than you might have previously.
Where: arstechnica.com
Sources and Links
Labs:
- Anthropic: https://anthropic.com
- Google DeepMind: https://deepmind.google
- OpenAI: https://openai.com
- Meta AI: https://ai.meta.com
- AMI Labs: (no public site yet)
- Microsoft AI (MAI): https://azure.microsoft.com/en-us/products/ai-foundry
- Mistral: https://mistral.ai
- Zhipu AI: https://zhipuai.cn
- Qwen (Alibaba): https://huggingface.co/Qwen
- DeepSeek: https://deepseek.com
- xAI: https://x.ai
People:
- Andrej Karpathy: https://karpathy.ai / @karpathy on X
- Simon Willison: https://simonwillison.net
- Gergely Orosz: https://newsletter.pragmaticengineer.com
- swyx: https://latent.space / @swyx on X
- Nolan Lawson: https://nolanlawson.com
- Mitchell Hashimoto: @mitchellh on X
- Max Woolf: https://minimaxir.com
- Ben Thompson: https://stratechery.com
- Nicholas Carlini: https://nicholas.carlini.com
Publications:
- The Batch: https://deeplearning.ai/the-batch
- TLDR AI: https://tldr.tech/ai
- Latent Space: https://latent.space
- Changelog: https://changelog.com
- Alpha Signal: https://alphasignal.ai
- The Pragmatic Engineer: https://newsletter.pragmaticengineer.com
- Hacker News: https://news.ycombinator.com
- Ars Technica: https://arstechnica.com
Benchmarks and Research:
- Symbolica AI, ARC-AGI-3 day-1 results: https://www.symbolica.ai/blog/arc-agi-3
- The Decoder, ARC-AGI-3 frontier model scores: https://the-decoder.com/arc-agi-3-offers-2m-to-any-ai-that-matches-untrained-humans-yet-every-frontier-model-scores-below-1/
- Will Knight and Zeyi Yang, Wired, 27 March 2026: https://www.wired.com/story/made-in-china-ai-research-is-starting-to-split-along-geopolitical-lines/
- OpenAI, ‘Accelerating the next phase of AI’, 31 March 2026: https://openai.com/index/accelerating-the-next-phase-ai/
- Alex Kim, ‘The Claude Code Source Leak: fake tools, frustration regexes, undercover mode’, 31 March 2026: https://alex000kim.com/posts/2026-03-31-claude-code-source-leak/
- VentureBeat, ‘Claude Code source code appears to have leaked: here’s what we know’, 31 March 2026: https://venturebeat.com/technology/claude-codes-source-code-appears-to-have-leaked-heres-what-we-know
- Julia Black, Vanity Fair, ‘Exclusive: OpenAI Preps Policy Push to Rethink the Social Contract’, 31 March 2026: https://www.vanityfair.com/news/story/openai-new-model-superintelligence-policy-push
- Kyle Wiggers, VentureBeat, ‘Microsoft launches 3 new AI models in direct shot at OpenAI and Google’, 3 April 2026: https://venturebeat.com/technology/microsoft-launches-3-new-ai-models-in-direct-shot-at-openai-and-google
This post reflects the state of the field as of 4 April 2026. The landscape moves fast. Check the changelog at the top for updates. If something here is wrong or out of date, the correction mechanism is in the changelog, not a correction buried in a later post.
Commissioned, Curated and Published by Russ. Researched and written with AI. You are reading the latest version of this post. View all snapshots.