Here at sipgate, we use Percona Monitoring and Management (pmm) to keep track of the health of our database systems. For those unfamiliar with it, pmm is a preconfigured system that uses a grafana instance with a bunch of percona provided dashboards, a VictoriaMetrics data sink and agents that collect data from database servers.
We started using this in November of 2021 and for the longest time, only had about a dozen of database systems configured in pmm. We added more systems over time and recently got to a point where pmm would become unresponsive and it would often show error messages when we tried to configure alerts. In the grafana logs, we could see messages indicating the database was locked. We were very surprised when we found that the grafana instance percona ships in this context uses an sqlite database in the background, which is not considered suitable for a larger grafana setup (our setup currently has 125 MySQL Nodes).
So we started looking into migrating this to a MySQL backend and – fortunately – found, that this is considered a solved problem since there is a migration tool by grafana. Up until importing the created dump file into the MySQL database, this went exactly as advertised, but the import failed due to some timestamp incompatibilities. There were several instances of either „created“ or „updated“ timestamp fields that had to be modified.
ERROR 1292 (22007) at line 5686: Incorrect datetime value: '2022-10-09 12:10:24.59099705+02:00' for column 'created' at row 1
So we edited the timestamp in the sql dump to this format:
Since it was only meta information about when a dashboard was created or updated, we didn’t care about preserving the actual time and instead just made sure the import worked. After this initial fix, the import was – apparently – successful and we could change the grafana configuration to use a MySQL database instead of the default sqlite. The following lines go into /etc/default/pmm-server:
When we started pmm with the MySQL backend for the first time, we saw several error messages in the grafana logs indicating parse errors and some dashboards also failed to load.
logger=sqlDashboardLoader <snip> msg="Error indexing dashboard data" error="readEscapedChar: invalid escape char after \\, error found in #10 byte of ...|\"/(([0-9\\.]+)\\.([0-9|..., bigger context ...|on-Variable-Query\"},\"refresh\":2,\"regex\":\"/(([0-9\\.]+)\\.([0-9\\.]+)\\.([0-9\\.]+)-?([0-9?]+))/\",\"skipUrl|..." dashboardId=61 dashboardSlug=mysql-replication-summary
The indicated dashboard „mysql-replication-summary“ also failed to load on the grafana website. The same error appeared for several other dashboards, for example the MySQL Table Details. So we looked at the data field in the dashboard table and noticed a difference between the dump and the data in the table:
Notice how in the dump, each backslash is preceeded with another backslash while in the MySQL table, there is only one.
The initial fix approach was to replace each backslash with two backslashes, but this resulted in way more errors than before since there are places in the dump where the existing number of backslashes is actually correct. So instead, we had to work out a way to match a string that is a regex, and replace escape characters in that string. There are certainly a number of tools you could try this with, our tool of choice was vim.
The thing is, to match a single backslash in a search string in vim, you have to search for „\\“. So to match two backslashes, it becomes „\\\\“. To make things worse, to write a backslash in the replace part of the command, you have to write „\\“ again. So … in order to match the two backslashes in the dump file, you have to search for „\\\\“ and to replace it with 4 backslashes, you have to replace it with „\\\\\\\\“.
Should you be in the situation of having to fix this problem, here’s what worked for us:
type a colon to enter command mode, then paste this:
and press enter.
Repeat this process with the following lines:
s/ (\[0-9\\\\\./ ([0-9\\\\\\\\./g
This allowed us to import the dump into mysql and get every dashboard working. Looking back at the initial problem of grafana becoming unresponsive and logging about a locked database: This has not been a problem since.