What's Been Happening Recently
Is consistency even important? We know it's contentious. Some people find it extremely important, and we all at least pay lip service to it. But really... how much effort should we put toward making sure we've got it?
As you might have noticed, recently we saw Aphyr doing a Jepsen run against Galera, shortly followed by Percona giving a lackluster response (after recent edits to the Percona response, it could be summed up as "Aphyr is right, but there is one word we don't like him using"), the Galera engineers writing a blog post that's internally-inconsistent (SI is provided but applications cannot depend on SI semantics), and myself doing a few internal writeups on this for Bigcommerce (BC).
We use Galera for a small cluster currently, and there is some interest in using Galera more extensively within our database infrastructure.
Recently several BC engineers and I sat down together and spent some time trying to understand what transaction isolation levels are, what they mean, and practical ways of understanding them. Jeff Hubbard got Jepsen, built a cluster, and began playing around with it.
So what have we learned? The results are somewhat surprising.
Consistency Implementation: The Theory
First, let's talk about what Consistency and Transaction Isolation Levels even are.
The four defined ANSI Transaction Isolation Levels enumerate a series of possible data inconsistencies and how they're prevented by each level. Essentially, these four levels were defined based on actual implementations of consistent databases using locking mechanisms in the mid 1900's. To quote Aphyr, who did a relatively good job of summing it up:
If you’re having trouble figuring out what these isolation levels actually allow [as opposed to prevent], you’re not alone. The anomalies prevented (and allowed!) by Read Uncommitted, Read Committed, etc are derived from specific implementation strategies. ... The standard doesn’t guarantee understandable behavior – it just codifies the behavior given by existing, lock-oriented databases. Locks aren’t the only ways we can protect transactional integrity. Some MVCC databases, like Oracle and Postgres [and InnoDB-backed MySQL], offer a different kind of model: Snapshot Isolation.
This hints at where the problem began for our industry. There was a bit of a shift in the industry back in the 1990's where we decided it was better to implement consistency using MVCC instead of locking, which changed the consistency model to Snapshot Isolation (which isn't any of the four ANSI transaction isolation levels), and then try to emulate the behaviours of locking databases using Snapshot Isolation minus (or plus) some extra semantics.
But attempts at formalizing this shift didn't make it to the mainstream. The peculiarities of lock-based ANSI transaction isolation level implementations are different than for MVCC-based ANSI transaction isolation level implementations, and it seems none of our industry luminaries know (or have bothered to share) how.
To complicate matters a bit more, there's no standard about which peculiarities are okay (only which ones aren't).
The precise-sounding statement "My transaction runs within a Repeatable Read Transaction Isolation Level," is actually not precise at all. There's a large class of problems you know won't occur, but there's a significant class of problems that aren't even addressed.
The theory around consistency and what implementations should allow/prevent is murky, at best.
Consistency Implementation: The Reality
The academic literature is somewhat precise (though only a few seem to have a handle on it), but our industry standards are most certainly imprecise.
But that's not the worst of it. We also generally assume our favorite DBMS properly implements the transaction isolation levels claimed. But if you've been following Aphyr's posts over the past few years, you know that, in fact, that's an uncommon occurrence. Most of the time the guarantees claimed by the DBMS aren't delivered.
It would appear that no-one implementing DBMSs in the past several decades has any precise technical documentation to define what they should be implementing. So they implement whatever they want, label it however they want, and (we) the consumers don't really have anything to say about it. All this is currently changing with Aphyr spearheading the effort to make the reality match the academic literature, but the process is slow-going.
What About SELECT...FOR UPDATE?
InnoDB implements MVCC, so that a given transaction sees the universe as a static image from the moment the START TRANSACTION statement is issued. Any other transaction that makes an attempt to read data modified by this transaction will succeed in seeing a version of the data from before the transaction began.
Until the COMMIT is sent, other transactions can read the data changed-but-not-committed by the transaction in question. Depending on how poorly the DBMS implements consistency, this can (and does) lead to inconsistent data in a highly-concurrent environment.
What I find interesting is that Percona suggest adding a
FOR UPDATE modifier to the
SELECT statements issued as part of its transaction. What does that mean?
FOR UPDATE modifier will lock the rows that are touched by the
SELECT. That is to say, even though InnoDB is an MVCC-based transaction system, exclusive locks are placed on rows. This makes the implementation of the consistency model a hybrid approach, including both MVCC and locks. And if you lock rows in your MVCC-based database, you are modifying the transaction isolation level! Not just for your transaction, but globally for every other transaction that might touch the rows you've locked.
I find it extremely difficult to reason about the meaning of the
FOR UPDATE modifier. It's clear that placing locks on those rows will make inconsistencies much harder to work their way into the system, but it is unclear what the new (elevated/more strict) transaction isolation level is, nor how this might affect performance on a system with many concurrent transactions.
If we were okay with a locking mechanism to provide consistency, why did we bother with MVCC to start with?
The Coli Conjecture and Inconsistency as the Norm
One of the early practical operational pioneers in the Distributed Database Realm (Cassandra in particular), Rob Coli, posited the following:
If you are using a distributed database, you probably don't care about consistency, even if you think you do.
-- Robert Coli
If it turns out consistency really isn't as important as performance, maybe that's why we implemented MVCC rather than locking.
The Coli Conjecture is based on the observation that distributed databases have gained huge inroads in places where consistency is claimed to be important, yet these databases rarely (if ever) deliver the consistency guarantees they promise. Even today, Cassandra presents new bugs that show the (eventual) consistency guarantees fail in several common administrative situations, like when compactions are run at the wrong intervals, or when nodes fail during critical moments, or when nodes fail and recover unexpectedly.
The Coli Conjecture is, in fact, probably an obvious conjecture in the face of historical evidence. Actual complete consistency in any human endeavour appears to be a vanishingly-rare occurance, even where you'd expect it to be the norm.
In banking, accounts payable/receivable, and inventory, with nearly every company you'll ever deal with, there are errors in these systems that are corrected with a resource-intensive out-of-band process. Computers reduced the degree of the errors, but they certainly still occur regularly.
And as we're starting to realise within our industry, computers, with their databases that should be predictable, in fact have a significant class of data inconsistency possibilities that aren't even defined in standards, and aren't taken care of by our databases during failure (and recall: failure is the norm). Frankly, it's surprising that our bank account balances and automated bill systems don't have more problems than they do.
It seems that, when a business does the value analysis of total consistency vs performance and cost, total consistency never wins.
So What's the Lesson?
Maybe it's wise to ask the question: is guaranteed total consistency really that important? Is it worth the cost to us?
If 99.9% of the time the data is consistent and right, is that good enough? Perhaps the other 0.1% can be dealt with by human processes, like phone calls, letters, meetings, hand-shakes. There's a point of diminishing returns on consistency.
Baron Schwartz pointed out that Galera cluster is actually running in production in a surprisingly large number of locations. There doesn't seem to be a huge uproar in the revelation that Galera doesn't support Snapshot Isolation (despite its claims to the contrary). Why?
The Coli Conjecture is based on observations of real-world companies like Netflix, Apple, and Google being totally fine with usually-consistent databases.
Maybe it just isn't that important.
If you want to work on these sorts of problems, we're hiring engineers to build microservices in Go, Ruby, Node.js, and Scala on AWS.