Please note that the following notes apply to the MySQL database that is no longer actively used for telemetry, as of the end of January-2013.

Restoring the complete database

The restoration of the complete database is an iterative and hopefully convergent process. Here are the basic steps of the process.

  1. Retrieve all of the binary logs from tape that haven't already been restored onto tel-test.tucson.lbto.org.
  2. Feed the logs to the DBMS, filtering out any purging related statements that accidentally slipped through.
  3. Backup the updated database to tape.

After working through the steps, time will have elapsed, so more binary logs will have been archived to tape. These will need to be restored as well, so the process repeats.

Retrieving new binary logs from tape

Have Kyle restore binary log files into the directory /bafs on tel-test.tucson.lbto.org. It will actually restore as /bafs/mnt/mysqlbackup.

Feed the logs to the DBMS and filtering purge statements

Before anything else is done, first make sure that the DBMS instance on tel-test is configured for restoration. See the topic TestRestoreSwitching to learn how to do this.

I was supposed to be vigilant and not replicate the purging operations done on the mountain to keep the disk from filling up, but I made a few mistakes, and a few data destroying operations got replicated. I've written a script in /root called find_delete.sh that will find all DELETE, DROP and TRUNCATE operations in the log files.

Unfortunately, not all of these deletion statements are purge-related. Only the ones that apply to tables in the tel_streams database are purge-related. The rest should be replicated. There will be a handful of deletion statements made to tables in the tel_metadata database between between April 27th and 30th that should be replicated.

To find all the deletion statements, the find_delete.sh script should be run. Depending on how many log files there are, the script may take a while to complete. The script will output the names of the log files it has read with any deletion statements found in a log file immediately following the name of the log file. The output should be scanned, and the locations of the deletion statements that should NOT be replicated should be noted. Here's an example of output.

...
/bafs/mnt/mysqlbackup/tel-store-bin.000009
/bafs/mnt/mysqlbackup/tel-store-bin.000010
DELETE FROM ndead_beef ...
DROP PROCEDURE foo ...
TRUNCATE TABLE streams;

/bafs/mnt/mysqlbackup/tel-store-bin.000011
...

The file tel-store-bin.000011 has three deletion statements. Only the first one applies to a table in the tel_streams database. This statement should not be replicated. The other two need to be.

Assume that the batch of logs restored by Kyle runs from tel-store-bin.000001 to tel-store-bin.000100 and that only tel-store-bin.000010 contains deletion statements that should not be replicated. We are free to "feed" tel-store-bin.000001 through tel-store-bin.000009 to the DBMS without modification. After that, we will need to semi-manually feed the statements in tel-store-bin.000010 to the DBMS. Finally, we can feed the rest of the logs to the DBMS without modification.

Feeding a sequence of logs to the DBMS

To feed a sequence of log files to the DBMS, first stop the DBMS. Second, clean out the residual logs from the data directory, /bafs/tel-current/mysql. Third, move the sequence of files from /bafs/mnt/mysqlbackup to /bafs/tel-current/mysql. Fourth, build an index file for them. Fifth, update the file relay-log.info to indicate the correct starting point. Sixth, make sure the mysql user owns everything. Finally, restart the DBMS. For example.

1. Shutdown the DBMS as root using mysqladmin. The password is the normal telemetry root password.

[root@tel-test ~]# mysqladmin --user=root -p shutdown

2. Clean out the residual log files from the DBMS data directory, /bafs/tel-current/mysql.

[root@tel-test ~]# cd /bafs/tel-current/mysql
[root@tel-test mysql]# rm tel-store-bin.*

3. Move tel-store-bin.000001 through tel-store-bin.000009 into the data directory.

[root@tel-test mysql]# mv /bafs/mnt/mysqlbackup/tel-store-bin.00000?  .

4. Build the relay index file, tel-store-bin.index. The relay index file is just a list of the logs, one per line, in the order they should be read.

[root@tel-test mysql]# ls ./tel-store-bin.* > tel-store-bin.index

For some unknown reason, the ./ is required in the above statement.

5. Update the first two lines in the relay information file, relay-log.info. The first line is the log file currently being ingested. The second line is the position in that log file. Edit the log file so the first line is ./tel-store-bin.000001 and the second line is 4. Here's what it should look like.

./tel-store-bin.000001
4
fake-bin-log.000001
...

6. Make sure the mysql user owns everything in the data directory.

[root@tel-test mysql]# chown mysql.mysql *

7. Restart the DBMS.

[root@tel-test mysql]# mysqld_safe &

Monitoring the ingestion progress of the DBMS

Depending on the number of log files, it could take minutes, hours or even weeks for the DBMS to ingest the sequence. To monitor its progress do the following.

1) Connect to the DBMS as root.

[root@tel-test ~]# mysql --user=root -p

2) Check the status of the slave.

mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Trying to connect to master
                  Master_Host: fake-host.tucson.lbto.org
                  Master_User: tel_replicator
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: fake-bin-log.000001
          Read_Master_Log_Pos: 4
               Relay_Log_File: tel-store-bin.000009
                Relay_Log_Pos: 812241684
        Relay_Master_Log_File: fake-bin-log.000001
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 666
                   Last_Error: I can't connect to fake-host.tucson.lbto.org
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 812241541
              Relay_Log_Space: 812241684
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 666
                Last_IO_Error: I can't connect to fake-host.tucson.lbto.org
               Last_SQL_Errno: 0
               Last_SQL_Error: 
1 row in set (0.00 sec)

Just ignore the IO error. It is related to there being no master. If an SQL error occurs, that's a real problem. It would be best to contact Tony Edgin for help figuring out how to resolve this.

The DBMS has ingested all of the logs when the Relay_Log_File is the last log file, in our case tel-store-bin.000009, and the Relay_Log_Pos is at the end of the file. The position should be roughly the length of the file. If in doubt, repeat the query. If the position is the same, the DBMS is finished.

Filtering out purged-related deletion statements

When the DBMS has finished ingesting logs tel-store-bin.000001 through tel-store-bin.000009, you can now filter out the deletion statements from tel-store-bin.000010 and feed the rest to the DBMS. Here's a way to do that.

[root@tel-test ~]# cd /bafs/tel-current
[root@tel-test tel-current]# mv ../mnt/mysqlbackup/tel-store-bin.000010 .
[root@tel-test tel-current]# mysqldump tel-store-bin.000010 > log.000010

Now use an editor and remove the three deletion statements from log.000010. Once the statements have been removed, feed log.000010 to the DBMS.

[root@tel-test tel-current]# mysql --user=root --password='???' < log.000010

Once the DBMS is finished ingesting log.000010, delete the files tel-store-bin.000010 and log.000010. They are no longer needed.

Now the remaining logs can be feed to the DBMS. This can be done the same way the first sequence of logs where feed to the DBMS.

Backing up the updated database

Once the current batch of binary log files have been feed to the database, the database should be backed up to tape. Ask Kyle to backup the directory /bafs/tel-current on tel-test. This will take a few days.

-- TonyEdgin - 14 May 2012
Topic revision: r7 - 07 Mar 2013, KelleeSummers
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback