Reliability
- Formal Specs in the LLM Era: The Validation Layer AI-Generated Code Is Missing
LLMs are good at generating code. They are bad at knowing whether it's correct. Informal Systems used an executable specification language called Quint to add a mechanically verifiable validation layer -- and collapsed a months-long refactor into a week.
- Amazon's Kiro Took Down AWS for 13 Hours. The Fix Reveals a Bigger Problem.
In December 2025, Amazon's internal AI coding agent Kiro caused a 13-hour AWS outage while fixing a minor bug. The real story isn't the outage -- it's what Amazon's internal memo and subsequent response reveal about how AI-assisted changes are (and aren't) being governed in production.
- Why LLMs Need Bayesian Reasoning (and How Google Is Teaching It)
Google Research published a paper showing LLMs can be trained to reason like Bayesians -- updating beliefs as evidence arrives rather than pattern-matching to a confident answer. For engineers running production systems, this matters more than most benchmark improvements.