Think Big

Advanced MySQL Basics

About once a year, there's a blog post from one of the current Tech Unicorns about how they had a big data problem and solved it with a slightly non-standard usage of MySQL. This is a really good thing, because MySQL is an extremely versatile tool. Assembled in a modular fashion and with open source, it's perfect as a robust, well-understood component in a larger architecture.

That said, I am often surprised at the basic mistakes and misunderstandings about MySQL of very smart and advanced tech professionals. It's become fashionable not only to discard the relational aspect of the database, but to be completely ignorant of it. It's a trend I cannot get behind. It's better to reject something based on an educated opinion than simply because you don't understand it. Therefore, I've put together this basic high-level overview of MySQL for the software professional of 2016, geared toward those that want to use it as more or less than a relational database, and potentially as a non-relational datastore.

In it, I present several rules-of-thumb for how to use MySQL in the ways I've seen these Tech Unicorns want to use it.

In each section, rules of thumb are quoted sections of text, like this. Rules of thumb are not laws of nature. They have exceptions. A good rule of thumb will, however, lead down the path of least resistance most of the time.

Let us begin!

MySQL as a K/V Store

This is something that crops up a lot. It's actually a great idea. MySQL is an extremely fast and stable storage engine and, if you use it properly, will beat out a lot of the current best-of-breed NoSQL K/V options.

Here is an example of an organization using MySQL as a K/V store: Wix Engineering Posting. The crew seems intelligent and well-spoken. I'm glad they wrote up their thoughts. Although this post will read a bit like a take-down of their efforts, please resist the urge to think of it that way. Theirs is just a fairly recent and extensive posting that will serve to illustrate some of our points.

The Wix team must have read some of the early distributed database literature, since they formed a rule about never using a JOIN. However, they are so afraid of joins that they build bad queries. One of their queries looks like this:

SELECT id1 FROM t1 where id1 = 
( SELECT id2 from t2 where criteria=x )

There is no possible way this could be more efficient than just joining the two tables together (though it is possible that one day MySQL will figure out how to optimize this by automatically turning it into a JOIN, due to the = before the sub-query).

They understood JOINs in your application code prevent putting the two tables on different servers. But due to blindness of their hard rule "NEVER JOIN," they failed to understand that if the two tables are on the same server, and you're going to optimize the query, doing so with a JOIN is better than with a sub-select. It's a good rule, too-strictly followed, and improperly broken.

Here's the proper way to do it, with a JOIN:

SELECT t1.id1 from t1,t2
 WHERE t1.id1 = t2.id2
   AND criteria=x

When you want to use MySQL as a distributed K/V store, and you want to be able to split tables across servers without touching application code, that's when the "no JOINs" rule becomes unbreakable, but by the same token, so then would their WHERE id = (subselect) formulation be illegal.

Have one process SELECT the IDs out of the first table into your application memory, then use the ID list to generate a SELECT ... WHERE id IN (id1,id2,...,idN) for a series of records from the second table.

This makes the tables independent, and they can live on physically separate machines.

If the two tables live physically on the same server, and you want to optimize for that fact, use a JOIN. You will need to modify application code when you split the tables onto separate servers.

You probably also want to visit the section below about making MySQL into a distributed data store, and the second section below about using MySQL as a K/V store. But first, let's address another issue the Wix team grappled with.

MySQL Active/Active Replication

The Wix team is trying to make a system with no single points of failure. This is a laudable goal. However, they choose an Active/Active(/Active) setup to achieve it. This is a mistake (not to pick on the Wix team - I've seen many make this same mistake).

One of the objections they have about auto_increment columns (they call them "serial keys") is that they interfere with active/active replication. This isn't the only problem, nor even the worst, with auto_increment columns. The auto_increment_offset and related parameters trivially solve the problem they're probably alluding to.

Their obsession misses the point in two ways. The first and minor way is that an integer primary key is a MySQL-specific optimization that comes from the fact that index leaf nodes store the entire primary key for each row pointed to, meaning a large primary key uses more memory and I/O. Also, using an incrementing integer keeps rows inserted around the same time clustered together on the I/O medium.

In MySQL, it is most efficient to assign a simple incrementing integer as the primary key for the table. To indicate row uniqueness, use an additional unique key.

The second, more egregious way the point is missed is that Active/Active replication with MySQL creates inconsistencies in your data. Why isn't this more-often noted by MySQL vendors? I will note it now.

MySQL (default asynchronous) Active/Active replication almost guarantees data inconsistency.

How? MySQL is asynchronous replication, and creating an Active/Active setup creates a split-brain problem... on purpose. Even if the network is 100% available, what happens when two clients, each talking to a different master database, are trying to achieve the same thing? For example, both are trying to add the same user, or change the same data about the same user in a different way.

Data inconsistency is nearly guaranteed and, to make matters worse, your application probably won't get errors, so you won't even notice your inconsistencies for a long time. This is a well-studied problem, and the upshot is you, your app, and your data aren't going to be happy.

The proper way to avoid single points of failure while keeping the performance of asynchronous replication is Master/Slave/Slave using GTIDs or a tool like MHA that will allow you to automatically fail over to a Candidate Slave when the Master fails.

Why Master/Slave/Slave? When you fail over from the (failed) Master to one Slave, you'll want the other Slave to build a new Master without downtime.

For a fault-tolerant asynchronous replication cluster, choose Master/Slave/Slave (active/passive/passive).

When the Master fails, fail over to one of the Slaves, then immediately begin building a second slave from the remaining good one. If you automatically fail over to a Slave (itself a dangerous action), you should still alert a human to wake up and come build another slave. If you lose a second machine in this cluster, the rebuild requires some down-time or degraded performance. If you have LVM, XtraBackup, or other instant-snapshot tools available, your performance will be degraded during the rebuild. If you don't have those options, you'll need downtime!

If a Master fails in your three-node replication cluster, immediately rebuild another node, for another loss will result in degraded performance or down-time for the cluster.

Also, it can be tempting to automate the whole setup. If the Master fails, automatically fail-over to a candidate Slave. If the Slave then fails or performs badly, and the Master seems okay, automatically fail-back to the Master. Resist this temptation. You will build a system that spends hours happily failing over and back again while the service that depends on the database cluster is 0% available the entire time. There is a rich history of cases where bright engineers built such a system and regretted it. Don't join their ranks.

Never automate the fail back. Fail-overs from Master to Candidate Slave: automatic is hard but doable. Fail-backs after things have settled down (or the Candidate Slave has also failed): always with human intervention! Also, always alert a human when a fail-over happens so they can fix the cluster state.

Also, it is often bad to leave the old failed Master in a usable state. Ensure it cannot take writes however you must. The safest method is to have the new Master shut off its power from the PDU, though admittedly, I have only rarely seen this level of thoroughness implemented, as it is also disconcerting.

When the old Master fails, be sure it cannot take writes while the new Master is acting Master of the cluster.

In general, implementing an automatic Master failover system is complex, error-prone, and time-consuming. Unless your org has defined a very strict SLA and given you the resources to follow through with a multi-month plan to implement a failover, you probably are better off just making it 100% manual.

Automatic Failover is hard. Weight each of the following for your org:

How often an unplanned failover might happen (in my experience, once every couple years in a reasonably high-volume shop).

The impact of the minutes of downtime while waiting on the human to fail it over (in my experience, 3-5 minutes).

The risk involved with attempting an automatic failover (in my experience: great, potentially creating a multi-hour outage).

Alright, enough about failover, and back to the point of this section. What if your org simply must have Active/Active? There is a way, though you will be losing asynchronous replication, with the added latencies and potential locking issues that come with it.

If you absolutely need Active/Active replication, choose a Galera Cluster solution. Every master is active with synchronous replication.

MySQL as a K/V Store, Pt. 2

Handler Socket (and the memcached extension)

Continuing from the above section about using MySQL as a K/V store, let's take it a bit further.

You're already abandoning the relational and transactional portions of MySQL. Why not abandon the SQL parsing overhead as well? There's a low-level interface into MySQL that allows you to write and read rows without any SQL at all called Handler Socket. Using it you can achieve some amazing read/write throughputs. The benchmarks from that 2012 project have it performing about 1.5x faster than memcached.

If raw handler socket coding is too confusing for you, but memcached is something you understand, you can get almost exactly the same phenomenal performance using the MySQL memcached extension, which makes MySQL look and act exactly like memcached, but with persistence.

If you're using MySQL as a K/V store, strongly consider abandoning the SQL parsing portion and directly storing/retrieving values. The memcached protocol support can dramatically simplify this.

Making MySQL Distributed

One of the easiest ways to make a distributed datastore backed by MySQL is to combine the MySQL memcached extension, mentioned earlier, with a memcached routing mechanism like Facebook's McRouter. With just these two tools, you can build some seriously powerful distributed NoSQL clusters. You also are given some tools for creating redundant data storage, and so may do away with MySQL's replication entirely. Now you have a highly-available highly-performant distributed K/V store.

But if you want to go with something a little more traditional, there is the YouTube Vitess project that will make a big fault-tolerant sharded distributed database backed by MySQL for you. And it's written in Go, so you have your excuse to play with golang!

If you need an architecture that's a little more traditional, you can always use the Tumblr Jetpants tools to build a large fault-tolerant sharded MySQL cluster. I've worked with it before, and it was a pleasure.

Keep in mind these projects were not easy to build. Not even the apparently-simple ones. One group of engineers I worked with mistook this for an easy problem and tasked a couple junior engineers to building a distributed MySQL-backed database in 6 months. The upshot? 1 year of wasted engineering resources, several months of wasted operational resources, an unstable, unworkable cluster, and eventually the entire codebase completely scrapped.

This rule-of-thumb shall be a warning...

If you want a MySQL distributed datastore, use an established technology. Do not write your own. Down that road lies disaster and untold pain. If you must travel the road, go prepared. Several of your most talented engineers will need a couple of years to travel it.

MySQL as a Document Store

Recently MySQL added the capability of storing JSON-based documents.

This means you can get the stability, speed, and scalability of MySQL without the common data loss experienced when using MangoDB which, though arguably the fastest database available as of this writing in 2016, leaves a bit to be desired in its persistence layer.

The past 8 years have shown that document stores tend to be long on PR talent and short on tech talent. This leads us to our rule-of-thumb:

When choosing a document store, ignore hype. Look for stability, uptime, and scalability. Look for proof of these elements.

Unfortunately, that means in this particular case, MySQL isn't the obvious win. The JSON capability is pretty new. I'll be paying attention over the next year for evidence that MySQL's JSON document store feature is fast and stable. If history predicts the future, we have great things to look forward to. In the meantime, trust, but verify.

Replicating only Some Databases/Tables

For complex technical reasons, there are multiple ways to try to replicate only part of a database to a slave. You want to ignore most of them and stick to the wild methods.

For replicating only some of a database, stick to e.g. the replicate-wild-ignore-table option.

IN and NOT IN vs Inner and Outer JOINs

SQL is a set-based language and there are two natural ways to solve each of the two types of problems "get all the rows from one table where there is (or is not) a matching set of rows in another table."

You want all the rows from one table where the IDs are in the second? Here are the two ways to do that:

SELECT things FROM table1 WHERE table1.id IN (SELECT id from table2);
SELECT things FROM table1 INNER JOIN table2 on table1.id = table2.id;

Similarly, the two ways to get all the rows from one table where the IDs are not in the second:

SELECT things FROM table1 WHERE table1.id NOT IN (SELECT id from table2);
SELECT things FROM table1 LEFT JOIN table2 on table1.id = table2.id WHERE table2.id IS NULL;

And here is where the problems begin. The first syntax (using IN or NOT IN clauses) is more intuitive to every single person I know. It's more intuitive to me, and I see it used by Engineers everywhere all the time. I suspect it's universally more clear to humans.

But MySQL traditionally (and to some extent today) is more efficient at planning and executing the second query. That is to say, when you have a choice to do an IN or NOT IN clause, but could possibly reformulate it as a JOIN (INNER JOIN for IN clauses, OUTER JOIN for NOT IN clauses), you should. This is the go-to advice of MySQL experts currently.

It looks like MySQL 5.7 improves the situation, but you should not count on this until you've seen benchmarks. I hope one day we can finally retire this go-to piece of advice, because IN and NOT IN are so much more intuitive to humans. I expect this to take several years. I'd be surprised if it's fixed before 2020.

Replace IN and NOT IN clauses with JOINs when possible.

Choosing Oracle, Percona, Maria

Here's the quick history of MySQL. First, there was MySQL AB, founded by Monty Widenius and David Axmark. They made MySQL good. Sun Microsystems bought MySQL AB, and a few of the core engineers split off to form their own company, Percona. Oracle bought Sun, and Monty split off to form his own company, MariaDB. An ASCII diagram is required at this point:

MySQL AB --> SUN --> Oracle
 \--> Percona \--> MariaDB

Percona are several of the earliest and best MySQL engineers, and as far as an alternative MySQL goes, it's been the #1 choice for 10 years. MariaDB, although founded by the first MySQL fellow, is actually a more recent addition to the game.

Percona patches Oracle's MySQL to get better performance. Maria tends to add entirely new features to Oracle's MySQL.

If you really want to live on the bleeding edge, and forego support entirely, there are other projects, like Webscale MySQL headed up by some of the biggest names on the internet. They provide patches of code that help if your throughput is measured in petabytes instead of terabytes.

"Never got fired for choosing ______" safety comes from Oracle. Better be lunching with your CFO every day.

Some of the best MySQL engineers that ever existed with reasonable support terms come from Percona.

The special sauce MariaDB brings to the table (mainly, the Aria engine), along with a name that will never give up on MySQL might be your thing.

If you want to dedicate engineers to supporting your own MySQL variant, Webscale MySQL is a great choice, and there are many others.

Thanks!

Special thanks to my friends of the digg diaspora, at Krux and elsewhere, for their input to topics covered, for proof-reading, and generally tolerating my obsessions over technical correctness. Without them I don't think this would ever have been written.

  • Rob Coli
  • Paul Lathrop
  • Jorge Gallegos

And all the other names that are left out due to my short-term memory.

Tim Ellis

Doing data that's bigger than most anyone else's data since 1999. Analytics. Front-end databases. Acronym soup.

place Los Angeles create