For the sake of doing their job, our business analysts need to be able to run database queries across data from several database servers. To make this possible, we built a node which runs several MySQL instances that each replicate one of our production database systems. Alongside these „replication instances“, there is another MySQL instance that combines all these replications into a single „front end“ using MySQL’s federated database engine. This engine does not store data locally but instead fetches them from a remote server using a standard client connection.
When logged in to this „federated database“, as we usually refer to it, the BAs have access to all data in seemingly one single MySQL instance.
While this somewhat works, there is also room for improvement. First of all, the federated engine is rather slowish. When executing a seemingly simple query like
select * from customers order by id desc limit 20;
to get the 20 latest subscriptions, the engine will fetch the entire table from the „backend server“ and the actual „data selection“ (ie get last 20 rows) will take place only after the entire table data has been transferred to the federated machine. To improve this, we run the replication- and federated instances on the same physical machine, so that, at least, data does not need to be transported via an actual network. And of course, having the overhead of running 8 instances of mysqld on the same server does not help performance either.
So last month we stumbled upon MariaDB’s multi source replication which might, in theory, get the federated engine out of the setup. We decided to give it a go and hopefully improve query performance because above mentioned engine behaviour would be replaced by a single MariaDB instance that replicates multiple production databases and is the „access point“ database at the same time.
This read quite promising and even worked to a high degree within a matter of hours. But there was one showstopper we could not get out of the way.
Every now and then, our BAs request new data to be added to the BA setup. They need a new replication or a new table and then the setup needs to be adjusted.
While setting up a new replication in a single MySQL instance is made real easy using Perconas xtrabackup tool, this becomes a different story in a multi source replication MariaDB setup.
To set up a new replication, there are 3 things to do:
- Create a snapshot of the original data that includes the replication information
- Import this snapshot on the destination machine
- Configure the slave information on the destination machine
For small to maybe medium sized data sets, this can simply be done with mysqldump. But when getting into hundreds of millions of rows, this is a rather long-running task. Importing a mysqldump with this amount of information easily takes a week or two, even on a modern machine. Perconas xtrabackup tool is way more efficient here. It does not dump the data into text statements, but rather creates a binary copy of the data, that can basically be copied into the data directory of the destination machine. The thing is, xtrabackup replaces the innodb system tablespace. Therefore, the destination database instance will „forget“ about any other innodb tablespaces aside from the ones just imported via xtrabackup. Whilst being able to see that it seemingly exists, you cannot access the actual data from that „old“ table.
mysql> show tables; +----------------+ | Tables_in_test | +----------------+ | a | +----------------+ 1 row in set (0.00 sec) mysql> select * from a; ERROR 1932 (42S02): Table 'test.a' doesn't exist in engine
In a setup that sports replications from 8 (and growing) production systems, this is a little „suboptimal“.
There is a way to import the old tablespaces though.
- Before you import anything, run
flush table for export $table
for every existing table in every existing database
- Move the resulting .cfg and existing .ibd files out of the data directory
- Create a backup of the table definition (show create table) of every existing table in every existing database
- Delete all existing tables
- Use xtrabackup to import the new data
- Run all previously saved „create table“ statements
- Dismiss all newly created tablespaces
alter table $table discard tablespace;
- Move all the .cfg and .ibd files back to their data directory
- Import the old tablespaces
alter table $table import tablespace
But if you would seriously like to try to automate this in a way that ensures all data is intact after the procedure, I would certainly applaud your enthusiasm.
So for now, we decided to stick to our old setup. It might not be the fastest, but it is easily extendible and well known.