Version History: State of AI

← Back to State of AI

Changelog

DateSummary
27 Mar 202627 Mar 2026 – First ARC-AGI-3 scores published: Symbolica’s Agentica scores 36.08% for $1,005; frontier CoT baselines (Opus 4.6 Max, GPT-5.4 High) score 0.2-0.3% at up to $8,900, a concrete illustration of the agent benchmark gap the post identifies.
26 Mar 202626 Mar 2026 – ARC Prize launches ARC-AGI-3, an interactive benchmark measuring agent learning efficiency and long-horizon adaptation rather than static answers; a concrete response to the agent evaluation gap the post identifies.
25 Mar 202625 Mar 2026 – Arm launches first-ever Arm-designed data center CPU for agentic AI workloads; LiteLLM PyPI supply chain attack introduces credential theft via compromised agent infrastructure dependency.
24 Mar 202624 Mar 2026 – GPT-5.4 Pro solves a frontier math open problem; iPhone 17 Pro demonstrated running a 400B LLM locally, extending the local inference ceiling.
23 Mar 202623 Mar 2026 – Quiet day, thesis holds.
22 Mar 2026Tinybox ships commercially: purpose-built offline AI device runs 120B models on 384GB GPU RAM, adding a hardware product layer to the local inference maturation story.
21 Mar 2026White House releases national AI framework to pre-empt state rules; OpenCode reaches 5M monthly developers as open-source CLI coding agent goes mainstream.
20 Mar 2026OpenAI acquires Astral (Ruff/uv) for Codex team; autoresearch scaled to 16 GPUs runs 910 experiments in 8 hours with emergent heterogeneous hardware strategy.
15 Mar 2026Meta planning 20%+ workforce layoffs explicitly to offset AI infrastructure costs, the largest single AI-driven displacement event yet reported at a major tech company.
14 Mar 2026Anthropic makes 1M context GA for Opus 4.6 and Sonnet 4.6; Morgan Stanley projects 9-18 GW US power shortfall through 2028 and flags accelerating AI-driven workforce reductions.
13 Mar 2026Washington and Utah pass significant AI disclosure and safety legislation; Nvidia GTC (March 16-19) previewed with NemoClaw agent platform and Groq inference chip expected.
12 Mar 2026METR study finds half of SWE-bench-passing PRs would be rejected by real maintainers; Amazon insiders report AI tools slowing work even as company mandates adoption and cuts 30,000 staff.
11 Mar 2026Yann LeCun raises $1B for AMI to pursue physical world models as a direct alternative to LLMs; overnight agent trust problem identified as structural challenge as coding agents scale.
10 Mar 2026chardet copyleft controversy surfaces a new legal fault line for AI reimplementation; Claude Code cost rebuttal confirms ~10x gap between retail API pricing and actual inference costs.
9 Mar 2026Agent Safehouse ships: macOS-native deny-first kernel sandboxing for local agents, 518 HN points, a direct community response to the governance gap the post identifies.
8 Mar 2026Karpathy ships autoresearch: autonomous agent that runs ML experiments overnight unsupervised, directly illustrating his own December inflection observation.
7 Mar 2026Tech employment worse than 2008 or 2020 recessions; China’s five-year plan makes AI dominance official national policy.
6 Mar 2026GPT-5.4 launches with native computer-use; Anthropic sues DoD over supply chain risk designation; Anthropic labor research finds real but limited displacement.
5 Mar 2026Alibaba Qwen leadership exodus – Justin Lin resigns, two others.
2 Mar 20261.0 Inaugural edition
3 Mar 2026OpenAI $110B raise.
4 Mar 2026Knuth’s “Claude’s Cycles” paper on HN.

Snapshots

No dated snapshots yet.