Version History: Building Agents That Can't Go Rogue: A Practical Safety Guide

← Back to Building Agents That Can't Go Rogue: A Practical Safety Guide

Changelog

DateSummary
27 Mar 202627 Mar 2026 – Anthropic auto mode ships Claude Code that decides its own permission gates; safety criteria undisclosed.
26 Mar 202626 Mar 2026 – Quiet day, thesis holds.
25 Mar 202625 Mar 2026 – Quiet day, thesis holds.
24 Mar 2026Meta confirms AI agent exposed sensitive data for two hours due to context blindness; Forbes: 1.3 billion agents incoming, most have no kill switch.
22 Mar 2026Quiet day, thesis holds.
21 Mar 2026Quiet day, thesis holds.
18 Mar 2026Quiet day, thesis holds.
17 Mar 2026Gravitee: 42-50% of agents unmonitored across sectors; Chinese state actors used jailbroken Claude Code to automate 80-90% of a nation-state cyber campaign.
14 Mar 2026Quiet day, thesis holds.
13 Mar 2026Quiet day, thesis holds.
12 Mar 2026Quiet day, thesis holds.
11 Mar 2026CodeWall autonomous agent hacks McKinsey Lilli platform in 2 hours; system prompt write access exposes a new category of AI infrastructure risk.
10 Mar 2026Amazon holds mandatory meeting on AI breaking systems; Krebs documents prompt injection via overlooked data fields as active agent attack vector.
8 Mar 2026Quiet day, thesis holds.
6 Mar 2026MIT/Cambridge survey of 30 agentic systems finds systemic lack of risk disclosure. McKinsey: 80% of orgs have encountered risky agent behaviour.
5 Mar 2026Anthropic/DoD: ‘bulk acquired data’ phrase as precision-of-constraint case study.
4 Mar 2026SteerEval: LLM controllability degrades at fine-grained specification levels.
3 Mar 2026Ars Technica reporter fired over AI-fabricated quotes.
2 Mar 2026Initial publication

Snapshots

No dated snapshots yet.