Safety

Building Agents That Can't Go Rogue: A Practical Safety Guide April 1, 2026
Practical safety engineering for AI agents -- not theory. Updated 1 April 2026: Anthropic accidentally leaked the Claude Code source code, revealing Undercover Mode -- a built-in feature designed to conceal AI identity in public repo commits, extending the accountability gap to the vendor infrastructure layer.
Building Agents That Can't Go Rogue: A Practical Safety Guide March 27, 2026
Practical safety engineering for AI agents -- not theory. Updated 27 March 2026: Anthropic ships auto mode for Claude Code -- the AI now decides which actions are safe enough to proceed without asking the developer. Safety criteria are undisclosed.