What an Autonomous Agent Found in McKinsey's AI Platform in Two Hours

22 March 2026 - 6 mins read

Commissioned, Curated and Published by Russ. Researched and written with AI.

On February 28, 2026, a security firm called CodeWall ran an authorised red-team exercise against McKinsey’s internal AI platform, Lilli. No credentials. No insider access. No human guiding the attack in real time. Just an autonomous agent, two hours, and roughly $20 in API tokens.

According to CodeWall’s disclosure, the agent exfiltrated:

46.5 million chat messages – strategy discussions, M&A conversations, client engagements – all stored in plaintext
728,000 files
57,000 user accounts
384,000 AI assistants
94,000 additional records

It also found that Lilli’s system prompts were writable.

McKinsey was notified the following day. The Register covered it on March 9. McKinsey has not publicly confirmed the specifics. Everything here is based on CodeWall’s disclosure and reporting from The Register, BankInfoSecurity, and The Stack.

This is not a story about McKinsey. It’s a story about how every major enterprise is building AI on top of a security posture that was already borderline and then adding new attack surface without treating it as new risk.

Five failures, none of them new

The entry point was an unauthenticated API endpoint with a SQL injection vulnerability.

That’s it. That’s the door. In 2026, a platform handling the internal communications of one of the world’s largest consulting firms had an externally accessible API endpoint that (a) required no authentication and (b) was injectable.

From there, the agent enumerated the database schema, identified the tables worth exfiltrating, and worked through them. Plaintext storage meant no decryption step. No rate limiting or anomaly detection meant nothing tripped during two hours of automated extraction.

The five failures:

Unauthenticated API endpoint – a surface that should not have existed without auth
SQL injection – unparameterised queries, a vulnerability class that has been understood and preventable for over two decades
Plaintext data storage – 46.5 million chat messages with no encryption at rest
No rate limiting or anomaly detection – two hours of sustained database enumeration without a single alert
Writable system prompts – the AI’s behaviour instructions were modifiable by an attacker

Every single one of these is a basic, pre-AI security failure. None of them required an AI platform to exist. They would be findings in a 2005 pen test.

The writable system prompts are the part worth dwelling on

The data exfiltration is serious. 46.5 million plaintext messages from a firm that sits in strategy conversations at the world’s largest organisations is genuinely damaging if it had been a criminal operation rather than an authorised exercise.

But the writable system prompt finding is a different category of risk.

A system prompt governs how an AI assistant behaves – what it will and won’t do, how it frames responses, what context it operates with. If that’s writable by an external actor, you don’t just have a data breach. You have the ability to silently redirect how McKinsey employees receive advice, what the AI refuses to surface, and what it will do when asked sensitive questions.

That’s not exfiltration. That’s subversion. The AI becomes an insider threat installed by an outsider.

Most enterprise AI deployments haven’t thought through this threat model because most enterprise AI deployments were built by product and engineering teams focused on capability, not adversarial access patterns. The assumption is that the AI is a tool, and tools get secured the same way other software does. But an AI with writable instructions and access to sensitive context is a different kind of surface.

What the autonomous agent actually changes

A skilled human attacker could have found these vulnerabilities. Given enough time, they probably would have. What the autonomous agent changes is the economics.

$20. Two hours. No operational security mistakes, no fatigue, no need to be patient while manually probing endpoints. The agent enumerated the schema and worked through it systematically with a consistency no human maintains across two hours of tedious database reconnaissance.

This is the actual shift. The vulnerabilities aren’t new. The detection window isn’t new. What’s new is that the threshold for executing a full enumeration attack is now $20 and an afternoon. An attacker with a meaningful budget can now run thousands of these in parallel across thousands of targets, automatically triaging which ones have something worth going deeper on.

Security teams have always had to defend against sophisticated attackers with time and resources. Now they’re defending against sophisticated automation with nearly infinite patience, no sleep requirement, and trivial marginal cost per additional target.

What this means for enterprise AI builds

Most enterprises are in the middle of building internal AI platforms. A lot of them look, architecturally, like Lilli probably looked: a web interface, an API layer, a backing data store, an LLM orchestration layer, and a set of system prompts that define the assistant’s behaviour.

The security review for most of these projects has focused on the AI layer – prompt injection, data leakage through model outputs, hallucination risks. Those are real concerns. But CodeWall found their way in through the API layer, with SQL injection, before the AI was even involved.

The questions every team building internal AI should be asking right now:

On the API surface: Which endpoints are accessible without authentication? Do any of them touch the database directly or indirectly? Have they been tested for injection vulnerabilities?

On data storage: What gets stored, in what form, and who can reach it? Chat logs from an internal AI assistant are high-value targets. If they’re in plaintext in a database with an injectable API in front of them, you have a problem that doesn’t require any AI-specific exploit to trigger.

On system prompts: Are they writable? By whom? With what access controls? Treat them as configuration with security implications, not as product copy.

On anomaly detection: Would you notice two hours of sustained database enumeration? What does your baseline look like, and what would cause an alert?

None of this is exotic. It’s standard application security applied to a new deployment pattern. The gap is that many AI platform builds are moving fast with teams that have strong ML and product skills but haven’t been through an adversarial security review.

A disclosure, not a criminal breach – but that’s not reassurance

CodeWall was authorised. They disclosed responsibly. McKinsey was notified before any of this became public. That’s how red-teaming is supposed to work.

But the correct response to a red-team finding of this severity is not “good thing it was just a test.” It’s “how long has this been like this, and has anyone else been here?”

The authorised nature of the exercise doesn’t change what was findable. It doesn’t change how long it would have taken. And it doesn’t change that the same vulnerabilities exist in other enterprise AI platforms right now, almost certainly, because the failures here are not unusual.

The $20 and two hours figure will get repeated in a lot of presentations this year. The point isn’t the specific numbers. The point is that the barrier to full database enumeration on a poorly secured internal AI platform is now low enough to be noise in an attacker’s operational costs.

Build accordingly.

Sources: CodeWall disclosure – The Register – BankInfoSecurity – The Stack