Commissioned, Curated and Published by Russ. Researched and written with AI.

This is the living version of this post. View versioned snapshots in the changelog below.


What’s New This Week

5 March 2026: Iranian state media confirmed the IRGC deliberately targeted Amazon’s Bahrain facility due to AWS’s support of US military operations. AWS health dashboard still showing facilities offline as of publication. AWS is advising customers to migrate workloads out of both ME-SOUTH-1 and ME-CENTRAL-1.


Changelog

DateSummary
5 Mar 2026Inaugural publication

Nobody’s disaster recovery runbook had a section called “drone strike.” Maybe it should have.

On Sunday 1 March 2026, as joint US-Israeli strikes on Iran triggered waves of retaliatory drone and missile attacks across the Gulf, three AWS data center facilities were hit. Two in the UAE were directly struck. A third in Bahrain was damaged by a strike in close proximity. AWS confirmed on Monday that the facilities had sustained structural damage, disrupted power, and in some cases required fire suppression that caused additional water damage. Recovery was described as “prolonged.” The AWS health dashboard, as of this writing, still shows the facilities offline.

Banking services went down. Payments platforms stopped working. SaaS providers serving the UAE and Bahrain reported disruptions. AWS told customers to migrate their workloads out of the region – as if that were something you could do in the middle of a live outage caused by an actual war.

This is not a story about an unexpected failure mode in some exotic edge case. This is a story about a risk that was always on the theoretical threat model and never seriously planned for in practice. Kinetic attacks on cloud infrastructure. Physical destruction of data centers by military force. The gap between “we have availability zones for redundancy” and “we have actual resilience when a conflict zone starts shooting at compute.”

That gap just became very concrete.


What Actually Happened

The timeline matters, because the sequence of failures tells you a lot about where the architecture held and where it didn’t.

The first alert appeared on the AWS Health Dashboard at around 12:51 UTC on Sunday, flagging disruptions to the mec1-az2 availability zone in UAE (ME-CENTRAL-1, the UAE region). AWS initially described it as “objects” striking a data center, creating “sparks and fire.” Local authorities cut power to the facility to contain the blaze. Standard operational procedure when your data center is on fire – but it meant the zone was gone.

About five hours later, the damage spread. Power disruptions cascading from the initial strike affected mec1-az3, a second availability zone in the same region. Two of three AZs in ME-CENTRAL-1 were now impaired. AWS noted in its health dashboard that S3 – “designed to withstand the loss of a single zone” – was now seeing “high failure rates for data ingest and egress.” The architecture had worked exactly as designed for single-zone failure. It was not designed for this.

Meanwhile, to the north, AWS was also investigating a “localized power issue” at mes1-az2 in Bahrain (ME-SOUTH-1). The Bahrain facility wasn’t directly struck – a drone strike in close proximity caused the damage. But the result was the same: offline, recovery taking at least a day, customers advised to migrate.

By Monday evening, AWS issued its formal statement: “In the UAE, two of our facilities were directly struck, while in Bahrain, a drone strike in close proximity to one of our facilities caused physical impacts to our infrastructure. These strikes have caused structural damage, disrupted power delivery to our infrastructure, and in some cases required fire suppression activities that resulted in additional water damage.”

The downstream blast radius was visible almost immediately. Careem – the ride-hailing and delivery platform that operates across the Gulf – reported outages. Alaan and Hubpay, both payments companies operating in the UAE, went down. ADCB and Emirates NBD, two of the UAE’s largest banks, reported service disruptions. Snowflake reported impact to customers running workloads in the region. This is not a niche failure. Financial infrastructure, consumer payments, and SaaS platforms all went dark simultaneously across two countries.

AWS’s advice to customers during the outage: “We continue to strongly recommend that customers with workloads running in the Middle East take action now to migrate those workloads to alternate AWS Regions.”

If your runbook for “migrate workloads to an alternate region” requires more than a few minutes to execute, that recommendation is not operational advice. It is an acknowledgement that the situation is beyond recovery in any reasonable timeframe.


Why AZ Isolation Didn’t Save Them

Availability zones are one of the foundational abstractions of cloud infrastructure design. The pitch is straightforward: by distributing workloads across multiple physically isolated facilities within a region, you protect against the failure of any single data center. Power failures, cooling failures, network failures, hardware failures – AZs are designed to be independent enough that one going down doesn’t take the others with it.

The theory is sound. For the failure modes AZs were designed against, it works.

The failure mode on Sunday 1 March was not a power fault or a cooling system failure. It was kinetic physical damage to two facilities in the same region, caused by military drone strikes in the same geopolitical event, within hours of each other. ME-CENTRAL-1 has three availability zones. The strikes damaged two of them. That’s not a scenario AZ isolation is designed to survive.

The architectural assumption baked into availability zone design is that independent failures are uncorrelated. A power fault in AZ-A shouldn’t affect AZ-B because they’re on separate power infrastructure. A network issue in AZ-B shouldn’t cascade to AZ-C because they’re independently connected. The independence assumptions hold for most failure scenarios because most failures are local, technical, and non-coordinated.

A military campaign targeting infrastructure in a geographic region is correlated by definition. All three AWS AZs in UAE-CENTRAL-1 are in the United Arab Emirates. If the conflict zone includes the UAE, all three AZs share the same geopolitical exposure. The physical isolation that protects against an electrical fault does not protect against a drone that is targeting the region.

The deeper issue is that ME-SOUTH-1 (Bahrain) and ME-CENTRAL-1 (UAE) are separate AWS regions – not just separate AZs. In AWS’s architecture, separate regions are supposed to represent genuinely independent failure domains. Running active-active across two regions should, in theory, give you much stronger guarantees than running across AZs within a single region. Yet both regions were affected in the same event. Because both regions are in the same geopolitical conflict zone. Region boundaries don’t map to military campaign boundaries.

For most SREs, multi-region has been the gold standard of resilience planning. This week demonstrated that multi-region is necessary but not sufficient if both regions share geopolitical exposure. The relevant failure domain for a conflict event is not “which AWS region” but “which countries are party to this conflict.”


Cloud Providers as Geopolitical Actors

Here is the part that changes the vendor risk conversation permanently.

On Wednesday 5 March, Iranian state media – specifically the Islamic Revolutionary Guard Corps via Fars News Agency – stated explicitly that the Bahrain facility was targeted to “identify the role of these centers in supporting the enemy’s military and intelligence activities.” The attack was framed as deliberate, not incidental. AWS wasn’t caught in the crossfire. It was, according to the IRGC, a target.

This is notable for several reasons. The stated rationale was AWS’s support of US military operations. Separately, reporting has emerged that Claude – Anthropic’s AI, which runs on AWS infrastructure – was being used in US military operations in Iran at the time. The CNBC sidebar linking to “5 unresolved questions hanging over the Anthropic-Pentagon fracas” and “Defense tech companies are dropping Claude after Pentagon’s Anthropic blacklist” tells you that this is a live and evolving story about what AI models running on cloud infrastructure means when that infrastructure becomes party to a conflict.

The strategic implication: if a cloud provider is supplying compute to military operations, adversaries in that conflict may treat the provider’s commercial infrastructure as a legitimate military target. This is not a hypothetical. It happened.

For years, the implicit assumption in commercial cloud infrastructure was that providers were neutral. AWS, Azure, GCP – they’re utility providers. They run payroll software and e-commerce platforms. They’re not military assets. That neutrality assumption has never been entirely accurate – US cloud providers have government and defence contracts, and they operate under US jurisdiction – but it has been the working assumption that most commercial customers relied on when evaluating risk.

That assumption is now explicitly in question. Iranian state media named a specific reason for targeting AWS. Whether the targeting was as deliberate and precise as claimed, or whether it was a post-hoc justification for a broader infrastructure strike, the signal is the same: cloud infrastructure in a conflict region is not neutral and will not be treated as neutral.

This changes the vendor risk assessment for any organisation operating in a region where their cloud provider has military contracts or geopolitical exposure. It’s not just about uptime SLAs anymore. It’s about whether your provider is a target.

The follow-on question – which most commercial customers are not positioned to answer – is what military or intelligence contracts your cloud provider holds in a given region, and whether those contracts make the provider’s infrastructure a strategic target in a conflict involving that region. AWS is not going to publish that information. You’re going to have to reason about it from context.


The Practical SRE Response

None of this is comfortable to think about, but it is now clearly in scope for infrastructure planning. Here’s what actually changes.

Region selection is now a geopolitical risk decision, not just a latency or compliance decision.

The traditional inputs to region selection were: latency to users, data residency requirements, feature availability, and cost. Geopolitical exposure was occasionally mentioned in enterprise risk frameworks but rarely made it into actual architecture decisions.

It needs to. Before selecting a cloud region for production workloads, the relevant questions now include: Is this region in a country with active military conflicts? Does my cloud provider have known military contracts in this region? Is there a plausible scenario where my provider becomes a target of a state adversary operating in this geography? These are not questions with clean answers, but they need to be asked.

Multi-region active-active is necessary but not sufficient if both regions share geopolitical exposure.

If you’re running active-active across ME-SOUTH-1 and ME-CENTRAL-1, you’re protected against single-region failure but not against a conflict that encompasses both Bahrain and the UAE. The relevant redundancy for geopolitical risk is cross-geography – ideally regions in different countries with different geopolitical alignments.

For organisations with genuine operational requirements in the Middle East, this creates a difficult situation. Regional presence is often mandated by compliance or latency requirements. The answer is not “don’t be in the Middle East” – it’s “don’t have your only failover be another region in the same conflict zone.” Cross-region failover to EU or US regions needs to be operationally tested and ready to activate, not a theoretical migration path documented in a runbook that nobody has executed.

Cross-cloud is worth taking seriously for true isolation.

Running your failover across two AWS regions means you’re still dependent on AWS’s physical infrastructure, AWS’s control plane, and AWS’s network connectivity. In an event like Sunday’s, where three AWS facilities across two regions were affected, cross-region replication within AWS didn’t help the customers who lost service.

True blast radius isolation for the most critical workloads means cross-cloud: your primary on AWS, your failover on Azure or GCP, with the runbooks and testing to actually make it work. This is significantly more complex and expensive than multi-region within a single provider. For most workloads it’s probably overkill. For financial infrastructure serving a region that sits in a geopolitical risk zone, the conversation has changed.

Your vendor risk assessment needs a geopolitical dimension.

ADCB and Emirates NBD are banks. They have regulators, risk frameworks, and extensive vendor due diligence processes. They still went down when AWS went down. The implication is that standard vendor risk assessment – uptime SLAs, SOC 2 certifications, DR capabilities – does not capture geopolitical targeting risk.

Adding a geopolitical risk dimension to cloud vendor assessment means asking: what jurisdictions does this provider operate under? What government contracts do they hold? What adversaries of those governments might treat this provider’s infrastructure as a target? Does the provider have a documented position on military and intelligence contracts, and is that position something you can evaluate? These are uncomfortable questions to put in a vendor questionnaire but they are now clearly relevant.

The migration advice AWS gave is a process you need to have already done.

“Migrate your workloads to alternate regions.” If executing that recommendation takes more than the time your SLA allows for recovery, you don’t have a migration capability – you have a plan to migrate after you’ve already failed. Active-active means both ends are hot. Warm standby means your failover target is partially provisioned. Anything that requires provisioning new capacity, restoring from backup, or updating DNS during the outage is not a disaster recovery capability for this kind of event.

The question to ask about your current architecture: if AWS ME-CENTRAL-1 went offline right now and stayed offline for a week, what would happen? If the answer is “we’d have an outage while we scrambled to provision elsewhere,” then Sunday’s events are a model of your risk.


The Threat Model Just Expanded. Update Yours.

Kinetic attacks on cloud infrastructure have been in the theoretical threat model for years. “What if someone physically destroyed a data center?” has been asked in security discussions, in academic papers on critical infrastructure protection, and in the appendices of risk frameworks that nobody reads carefully.

This week, it stopped being theoretical.

Two AWS data centers in the UAE were directly struck by drones. A third in Bahrain was hit in close proximity. Three facilities across two regions, offline simultaneously, during a regional military conflict in which the cloud provider was explicitly named as a target. Banking services went down. Payments stopped. AWS told customers to migrate their workloads. The “prolonged” recovery is still ongoing.

The threat model for cloud infrastructure now explicitly includes: kinetic physical attacks on data center facilities during military conflicts, geopolitical targeting of cloud providers perceived as supporting state military operations, and simultaneous failure of multiple availability zones and regions within a single geopolitical conflict zone.

AZ isolation doesn’t protect against correlated geopolitical events. Multi-region within a conflict zone doesn’t protect against a campaign that encompasses the whole zone. Uptime SLAs and DR certifications don’t cover “the facility was hit by a drone.”

The response to this is not panic or a wholesale retreat from cloud infrastructure in sensitive regions. Cloud remains the right choice for most workloads in most geographies. But the risk model for the Middle East just changed in a concrete and documented way. The appropriate response is to update the threat model to reflect reality, audit whether your current failover strategy would have survived Sunday’s event, and make a deliberate decision about what level of geopolitical risk you’re accepting – not accidentally inheriting a dependency you’ve never stress-tested.

Nobody’s DR runbook said “drone strike.” After this week, that’s not an excuse anymore.


Sources: CNBC (drone strikes)CNBC (banking disruption)CNBC (Iran targeting)BBCThe RegisterAP News