Galera's Consistency Claims Don't Survive Contact With a Healthy Cluster

17 March 2026 - 9 mins read

The findings in this post are drawn from the Jepsen analysis of MariaDB Galera Cluster 12.1.2, published in 2026.

What’s New This Week

Kyle Kingsbury published the Galera Cluster 12.1.2 analysis on 17 March 2026. All four issues (MDEV-38974, MDEV-38976, MDEV-38977, MDEV-38999) remain unresolved as of publication.

Changelog

Date	Summary
17 Mar 2026	Initial publication.

MariaDB Galera Cluster documents itself as a system that “ensures no lost transactions” and provides isolation “between Serializable and Repeatable Read.” Kyle Kingsbury’s latest Jepsen analysis found something simpler and more alarming: in a healthy, fully-connected cluster with no faults injected, Galera allows silent data loss under concurrent load. No partition. No crash. Just two transactions reading the same row and both writing an update.

That’s not a theoretical edge case. That’s a Tuesday.

What Jepsen Is and Why These Analyses Matter

Kyle Kingsbury has been systematically breaking distributed databases since 2012. The methodology is consistent: take the vendor’s documentation at face value, design tests that exercise the claimed guarantees, and publish what you find. No compensation from vendors. No pre-publication review that softens conclusions.

The track record is extensive. Systems that fail Jepsen scrutiny tend to fail because their documentation oversells the actual guarantees – not because of obscure bugs, but because of architectural decisions that were made and then obscured in the marketing layer. The systems that hold up – PostgreSQL, FoundationDB – have documentation that accurately describes what they do.

Jepsen analyses matter precisely because the gap between “what the docs say” and “what happens in production under concurrent load” is where data loss lives.

What Galera Claims and What It Actually Provides

MariaDB’s documentation makes specific, strong claims. The Galera Cluster Replication Guide says: “transactions are committed on all nodes (or fail on all) before the client receives a success confirmation.” The architecture documentation says “when a transaction COMMITs, all nodes in the cluster have the same value.” The use-cases page says “only after Node A gets an ‘OK’ from all other nodes does it tell the client, ‘Your transaction is committed.’”

Jepsen’s response is blunt: “This is obviously wrong.” If Galera actually required all nodes to confirm before acknowledging a commit, a single node failure would bring the whole cluster down. That’s not how Galera works, and it’s not how Galera is marketed – the high-availability story depends on tolerating node failures. The documentation is internally contradictory.

The claimed isolation level is also buried. The one reference Jepsen found to isolation levels in the Galera documentation is in the Management section, under Installation and Deployment, on the “Tips on Converting to Galera” page, under the “Transaction size” heading. It says: “Galera’s tx_isolation is between Serializable and Repeatable Read.”

Repeatable Read is a strong guarantee. In most formalisms, for primary-key access, it’s roughly equivalent to Serializable. Engineers reading that claim would reasonably conclude their concurrent workloads are protected.

They are not.

The P4 Finding: Lost Update in a Healthy Cluster

The most important result in the Jepsen report has nothing to do with network partitions or node crashes. It is reproducible in a healthy, fully-connected, fully-operational Galera cluster under concurrent load.

P4 is the Lost Update anomaly. The canonical definition: transaction T1 reads a value, then transaction T2 updates that value, then T1 updates the same value based on its earlier read. T2’s write is silently overwritten. In concrete terms:

T1 reads X = 10
T2 reads X = 10
T2 writes X = 15, commits
T1 writes X = 11, commits
Final value is 11, not 15 or 16 – T2’s update is gone

This is not an exotic failure mode. Read-modify-write is one of the most common patterns in application code. Counters, inventory levels, account balances, booking availability, ticket allocations – all of these involve reading a current value and writing an updated one. On Galera, two concurrent transactions doing this on the same row will silently lose one of the writes.

The Jepsen report includes a specific test run showing this happening in practice. Key 468 was read as empty by one transaction, which then appended value 3. A concurrent transaction appended value 6. The resulting state showed [6, 3, …] – meaning T2 modified the row between T1’s read and T1’s write. T1’s update proceeded based on a stale read, and the conflict was not detected.

Galera’s certification-based replication detects write-write conflicts on the same primary key. It does not detect read-write conflicts – the case where one transaction reads a row and another writes it before the first transaction’s write. This is a known limitation of optimistic concurrency that Galera’s documentation does not clearly disclose.

P4 violates Snapshot Isolation. Since the Jepsen tests used primary key access, it also violates Repeatable Read – the weaker of the two levels Galera claimed to provide.

Jepsen also observed G-single: more complex cycles involving multiple keys or more than two transactions, each with a single read-write dependency. These are generalised violations of Snapshot Isolation. They also occurred in healthy clusters, without faults.

There are two additional findings that do require fault injection, but are still worth understanding.

Write loss on coordinated crash (MDEV-38974). Under MariaDB’s recommended configuration for Galera, innodb_flush_log_at_trx_commit=0, committed transactions are not flushed to disk before acknowledgement. When all nodes crash at approximately the same time – coordinated failures caused by flooding, cooling, network bugs, or lightning are real – transactions that were acknowledged as committed can simply disappear. In one one-minute test run, nine values across three rows were lost this way.

The documentation describes this setting as “a safer, recommended option” for Galera, on the grounds that any node failure can be recovered by syncing from another node. That reasoning only holds when node failures are independent. It does not hold when they are simultaneous.

Write loss under partitions (MDEV-38976). Even with innodb_flush_log_at_trx_commit=1, Galera occasionally loses committed writes when tests involve process crashes and network partitions. In one test run, approximately 19 seconds of writes across four objects were lost. One key lost all 25 elements it had accumulated and was reset to empty. All of the affected transactions had been acknowledged as successfully committed.

This is infrequent – once every few hours of testing – but the loss of acknowledged commits under partition scenarios is not a minor edge case for systems that care about data integrity.

What Workloads Are Actually Safe on Galera

Given the actual guarantees rather than the documented ones, there are workloads where Galera is fine.

Safe: Append-only workloads where rows are never read before being written. Workloads where each row has a single writer at any given time. Read-heavy workloads where a small amount of staleness is acceptable. Bulk inserts of new data without concurrent modification of existing rows.

Not safe: Financial ledgers. Inventory counters. Booking and reservation systems. Ticket allocation. Any system where two concurrent transactions might read the same row and write an updated value based on that read.

The rule is simple: if your application ever does SELECT ... WHERE id = X followed by UPDATE ... WHERE id = X in a transaction, and two instances of that pattern might run concurrently on the same row, you have a P4 risk on Galera.

What to Do If You’re Running Galera Today

First: set innodb_flush_log_at_trx_commit=1 if you haven’t already. MariaDB’s recommendation to use 0 for Galera clusters is wrong. Yes, it improves performance. Yes, recovery from another node works for independent failures. It does not work when all your nodes go down simultaneously.

Second: audit your application code for read-modify-write patterns. Any ORM that generates SELECT-then-UPDATE sequences, any counter increment, any balance adjustment – these are potentially affected.

Third: SELECT FOR UPDATE serialises access to a row and prevents the read-write conflict that causes P4. It works, but it requires every code path that touches a row to use it consistently. One missed path breaks the guarantee.

Fourth: consider whether multi-master writes are actually necessary for your workload. Single-primary replication mode avoids the multi-master write conflict problem entirely, at the cost of losing the ability to write to all nodes simultaneously. For many workloads – where writes go to one node and reads are distributed – this is an acceptable trade.

Fifth: if your workload genuinely requires strong consistency for concurrent writes to the same rows, evaluate alternatives. PostgreSQL provides Serializable Snapshot Isolation and it holds up under Jepsen testing. CockroachDB and FoundationDB are designed around strong consistency as a first-class requirement. Migration is not trivial, but the cost of silent data corruption in production is higher.

Do not rely on Galera’s documented isolation claims to make this assessment. Read the Jepsen report. Test your specific workload. The claims in the documentation are not what you will get.

The 10-Year Pattern

Jepsen first analysed Galera in 2015 and found that Codership Oy – the original developers – had intentionally designed Galera without the first-committer-wins property of Snapshot Isolation. The result was that a simulated bank transfer workload could create or destroy money. The 2015 analysis was published. The issue was acknowledged. A decade passed.

In 2025, MariaDB acquired Codership Oy, bringing Galera under MariaDB’s umbrella. The 2026 analysis found the same category of problem: consistency claims in the documentation that don’t match the actual behaviour, and isolation anomalies that matter for real workloads. MDEV-38977 (Lost Update) remains unresolved.

This is not a pattern unique to Galera. It is the normal pattern for distributed systems that were designed with a particular performance profile in mind and then documented to claim stronger guarantees. Data integrity failures are not usually dramatic events – they accumulate quietly in the gap between what the system claims to do and what it actually does.

The databases that have survived repeated Jepsen scrutiny share a property: their documentation accurately describes what they do, including the limitations. That discipline is harder than it sounds. It requires saying “our system does not provide X” in the documentation, even when X is what potential users want to hear.

Strong Consistency Is a Design Choice, Not a Configuration Option

Galera’s certification-based replication detects write-write conflicts. It does not detect read-write conflicts. That is an architectural constraint, not a bug that can be patched. The P4 finding is not going to be fixed with a configuration change or a minor release; it would require Galera to take write locks on reads that will be followed by writes, which is a fundamentally different concurrency model with different performance characteristics.

This matters for data integrity in any system – including the AI agent pipelines and distributed systems being built on top of databases right now. The trust assumptions you make about your infrastructure propagate upward through every layer of the stack.

Understanding what guarantee your database actually provides – not what its documentation claims to provide – is not optional. It is the foundation everything else is built on. If that foundation has silent failure modes, the consequences are not always visible until they matter most.

Galera is a capable system for the workloads it actually supports. The problem is not the system. The problem is the mismatch between the documented guarantees and the real ones – and the decade it has taken to clearly document that gap.