Why MariaDB's multi source replication cannot (yet) replace MySQL's federated engine for us

For the sake of doing their job, our business analysts need to be able to run database queries across data from several database servers. To make this possible, we built a node which runs several MySQL instances that each replicate one of our production database systems. Alongside these „replication instances“, there is another MySQL instance that combines all these replications into a single „front end“ using MySQL’s federated database engine. This engine does not store data locally but instead fetches them from a remote server using a standard client connection.

When logged in to this „federated database“, as we usually refer to it, the BAs have access to all data in seemingly one single MySQL instance.

federated

While this somewhat works, there is also room for improvement. First of all, the federated engine is rather slowish. When executing a seemingly simple query like

select * from customers order by id desc limit 20;

to get the 20 latest subscriptions, the engine will fetch the entire table from the „backend server“ and the actual „data selection“ (ie get last 20 rows) will take place only after the entire table data has been transferred to the federated machine. To improve this, we run the replication- and federated instances on the same physical machine, so that, at least, data does not need to be transported via an actual network. And of course, having the overhead of running 8 instances of mysqld on the same server does not help performance either.

So last month we stumbled upon MariaDB’s multi source replication which might, in theory, get the federated engine out of the setup. We decided to give it a go and hopefully improve query performance because above mentioned engine behaviour would be replaced by a single MariaDB instance that replicates multiple production databases and is the „access point“ database at the same time.

mariadb

This read quite promising and even worked to a high degree within a matter of hours. But there was one showstopper we could not get out of the way.

Every now and then, our BAs request new data to be added to the BA setup. They need a new replication or a new table and then the setup needs to be adjusted.

While setting up a new replication in a single MySQL instance is made real easy using Perconas xtrabackup tool, this becomes a different story in a multi source replication MariaDB setup.

To set up a new replication, there are 3 things to do:

Create a snapshot of the original data that includes the replication information
Import this snapshot on the destination machine
Configure the slave information on the destination machine

For small to maybe medium sized data sets, this can simply be done with mysqldump. But when getting into hundreds of millions of rows, this is a rather long-running task. Importing a mysqldump with this amount of information easily takes a week or two, even on a modern machine. Perconas xtrabackup tool is way more efficient here. It does not dump the data into text statements, but rather creates a binary copy of the data, that can basically be copied into the data directory of the destination machine. The thing is, xtrabackup replaces the innodb system tablespace. Therefore, the destination database instance will „forget“ about any other innodb tablespaces aside from the ones just imported via xtrabackup. Whilst being able to see that it seemingly exists, you cannot access the actual data from that „old“ table.

mysql> show tables;
+----------------+
| Tables_in_test |
+----------------+
| a              |
+----------------+
1 row in set (0.00 sec)
mysql> select * from a;
ERROR 1932 (42S02): Table 'test.a' doesn't exist in engine

In a setup that sports replications from 8 (and growing) production systems, this is a little „suboptimal“.

There is a way to import the old tablespaces though.

Before you import anything, run
```
flush table for export $table
```
for every existing table in every existing database
Move the resulting .cfg and existing .ibd files out of the data directory
Create a backup of the table definition (show create table) of every existing table in every existing database
Delete all existing tables
Use xtrabackup to import the new data
Run all previously saved „create table“ statements
Dismiss all newly created tablespaces
```
alter table $table discard tablespace;
```
Move all the .cfg and .ibd files back to their data directory
Import the old tablespaces
```
alter table $table import tablespace
```

But if you would seriously like to try to automate this in a way that ensures all data is intact after the procedure, I would certainly applaud your enthusiasm.

So for now, we decided to stick to our old setup. It might not be the fastest, but it is easily extendible and well known.

2 Kommentare

Artem Perekresny: 03.09.2019 | 16:42

As of v10.2.8 mariabackup supports a partial backup as well as ALTER TABLE … IMPORT TABLESPACE for InnoDB tables. Does it solves your complications or you have choosed some another solution?

antworten

Dominik: 04.09.2019 | 07:43

We are still using the federated engine, but have not yet had a need to look into it again (the article is three years old). Thanks for the pointer.

Why MariaDB's multi source replication cannot (yet) replace MySQL's federated engine for us

2 Kommentare

Artem Perekresny: 03.09.2019 | 16:42

Dominik: 04.09.2019 | 07:43

Schreibe einen Kommentar Antworten abbrechen