Version History: Building Agents That Can't Go Rogue: A Practical Safety Guide
← Back to Building Agents That Can't Go Rogue: A Practical Safety Guide
Changelog
| Date | Summary |
|---|---|
| 27 Mar 2026 | 27 Mar 2026 – Anthropic auto mode ships Claude Code that decides its own permission gates; safety criteria undisclosed. |
| 26 Mar 2026 | 26 Mar 2026 – Quiet day, thesis holds. |
| 25 Mar 2026 | 25 Mar 2026 – Quiet day, thesis holds. |
| 24 Mar 2026 | Meta confirms AI agent exposed sensitive data for two hours due to context blindness; Forbes: 1.3 billion agents incoming, most have no kill switch. |
| 22 Mar 2026 | Quiet day, thesis holds. |
| 21 Mar 2026 | Quiet day, thesis holds. |
| 18 Mar 2026 | Quiet day, thesis holds. |
| 17 Mar 2026 | Gravitee: 42-50% of agents unmonitored across sectors; Chinese state actors used jailbroken Claude Code to automate 80-90% of a nation-state cyber campaign. |
| 14 Mar 2026 | Quiet day, thesis holds. |
| 13 Mar 2026 | Quiet day, thesis holds. |
| 12 Mar 2026 | Quiet day, thesis holds. |
| 11 Mar 2026 | CodeWall autonomous agent hacks McKinsey Lilli platform in 2 hours; system prompt write access exposes a new category of AI infrastructure risk. |
| 10 Mar 2026 | Amazon holds mandatory meeting on AI breaking systems; Krebs documents prompt injection via overlooked data fields as active agent attack vector. |
| 8 Mar 2026 | Quiet day, thesis holds. |
| 6 Mar 2026 | MIT/Cambridge survey of 30 agentic systems finds systemic lack of risk disclosure. McKinsey: 80% of orgs have encountered risky agent behaviour. |
| 5 Mar 2026 | Anthropic/DoD: ‘bulk acquired data’ phrase as precision-of-constraint case study. |
| 4 Mar 2026 | SteerEval: LLM controllability degrades at fine-grained specification levels. |
| 3 Mar 2026 | Ars Technica reporter fired over AI-fabricated quotes. |
| 2 Mar 2026 | Initial publication |
Snapshots
No dated snapshots yet.