Please note that the following notes apply to the MySQL database that is no longer actively used for telemetry, as of the end of January-2013.
There should no longer be any problems with replicating the database.

Telemetry Database Troubleshooting Notes

Machines / Logs / Commands

tel1.tucson.lbto.org

This is the replication database (the slave), running mysql in Tucson.
  • account: root (tucson root password)
  • log file: /tel-db/mysql/tel1.tucson.lbto.org.err

mysqld runs here as the mysql user
we use root as the database user to run mysql from the command line (see pre-backup script for password in /tel-db/ )
- use this to show slave status\G;
start slave;
show binary logs;
purge binary logs to ...
change master ...

  • mysql version 5.1.61
  • my.cnf file used is in /tel-db
  • not using the incremental backups anymore (Jan-2013), so we're purging the binlogs on the slave (tel1.tucson.lbto.org) weekly to clean up
    PURGE BINARY LOGS BEFORE DATE_SUB( NOW( ), INTERVAL 2 DAY);

ovms.mountain.lbto.org

  • user ovmstest (no e, ! at end)
    ovmsTelemetryClient commands to status/resume/pause telemetry collection (-s -p -r -h )
  • root (mountain root password)
    start/stop the daemon: /sbin/service ovms-svc [start,stop,restart,status]

tel-collect.mountain.lbto.org

  • root (mountain root password )
    this is the mountain telemetry db;
  • cron jobs here for truncating the entire table and rotating the log file

Table of Problems

Date Problem Solution
2-Dec-2012 found Slave_SQL_Running set to No in status
Relay log read failure last log message
see Bad Record in Relay Log notes below
20-Nov-2012 tel1 machine crashed on 19-Nov see Michele's notes below

Telemetry Clients

Currently, there are three "systems" which provide data to the Telemetry system: OVMS, TCS, and MCSPU.
  • TCS
    Subsystems of the TCS are nearly always sending telemetry data. However, this can be turned off for all but a few bytes (literally) of ECS data by modifying the tcs/etc/tcs.conf file. The variable "collectTelemetry" can be set to false. All subsystems, except for ECS, check this variable to determine whether or not to send telemetry. I note that most subsystems also have an internal variable to decide whether the subsystem wants to send telemetry. If this internal variable exists, then both the "collectTelemetry" and the subsystem variable need to be true for the subsystem to ship telemetry data.
  • MCSPU
    The MCSPU can send an enormous amount of telemetry data with respect to the mount. This stream is typically on, but has currently been turned off due to the problems we have been having with telemetry. Tom could not exactly remember how he turns on/off the MCSPU telemetry, so you will have to ask him for details. MCSPU sends telemetry at two different rates. If an axis is ready, it sends at 10 Hz; otherwise the data is only sent at 1 Hz. No rotator data is currently being sent.
  • OVMS
    The OVMS telemetry service typically runs as a daemon, on reboot: ovmsTelemetry --daemon
    Or can be started by: /sbin/service ovms-svc [start|stop|restart|status]
    The data is always broadcast by the OVMS application (for use by clients such as the ovmsMonitor). When the ovmsTelemetry service is running, telemetry collection can be run or paused using the ovmsTelemetryClient. Typically the service is left running and there is a cron job which pauses and resumes the collection using the ovmsTelemetryClient so that it only runs at night (0:00 UT to 14:00 UT; 5:00 pm - 7:00 am local time).
    The cron job looks like this:
     0 00 * * * root ovmsTelemetryClient -r 
    0 14 * * * root ovmsTelemetryClient -p
[ovmstest@ovms]$ ovmsTelemetryClient -h
Client for the OVMS Telemetry Interface Service (Ver. 0.6.1)
Usage :  ovmsTelemetryClient [-Option] [<Param1> <Param2> <ParamN>]
  -p (pause) : Pauses the archive of the vibration data into the Telemetry system.
  -r (resume) : Resumes the archive of the vibration data into the Telemetry system.
  -s (getStatus) : Shows the status of the OVMS Telemetry interface.
  -f Freq (setSamplingRateHz) : Sets the Sampling rate to archive the vibration data into Telemetry.
  -x (shutdown) : Shutdown the OVMS Telemetry interface. This may take some seconds...
  -h (help) : Shows the help page.(See also: man ovmsTelemetryClient)


Detailed Troubleshooting Logs

Restoring Mountain after Power Crash

Tony's notes on Database Recovery - including skipping corrupted replication records on the slave.

tel1 Machine Crash - Michele's Notes

TEL1 crashed and Slave database could not recover due to duplicate entry.
Called Chris Janton to step us through what he would do to fix this problem.

(1) Use the error log on TEL1 in /tel-db/mysql, tel1.tucson.lbto.org.err, to see what happened when the machine crashed and how the replication broke.
121118 19:01:53 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
121118 19:01:53 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'bin_log.009049' at position 1022963402

Note:
I determined:
  • mysql was not running
  • disk 90% full
  • the machine had crashed at ~3:47 am on 19 November 2012
Dan discovered a very large core file which was so large it could not be written in the limited disk space available. As such, we do not know what caused the system crash. Dan not only moved the core file to another location to free up disk space, but he also redirected the system to write core files to a different location where lots of space is available.

Stephen told us to restart doing the backups of the tel1-bin log files. Kyle has a program on Windows which runs the pre-backup.sh and post-backup.sh scripts on TEL1 in /tel-db. The files are actually backed up from /tel-db/binlog.
We could not start mysql demon without enough disk space. Once it was freed up, we could start the demon and see the following error
121119 12:03:09 [Note] Recovering after a crash using tel1-bin
121119 12:03:13 [ERROR] Error in Log_event::read_log_event(): 'read error',
data_len: 16969, event_type: 2
121119 12:03:13 [Note] Starting crash recovery...
121119 12:03:13 [Note] Crash recovery finished.
121119 12:03:14 [Warning] Neither --relay-log nor --relay-log-index were used;
so replication may break when this MySQL server acts as a slave and has his
hostname changed!! Please use '--relay-log=mysql-relay-bin' to avoid this
problem.
121119 12:03:14 [Note] Slave SQL thread initialized, starting replication in
log 'bin_log.009087' at position 900001397, relay log
'./mysql-relay-bin.003443' position: 900001540
121119 12:03:14 [Note] Event Scheduler: Loaded 0 events
121119 12:03:14 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.61-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source
distribution
121119 12:03:14 [Note] Slave I/O thread: connected to master
'tel_replicator@128.196.248.94:3306',replication started in log
'bin_log.009087' at position 900001397

This is where the problem happened.
121119 12:03:14 [ERROR] Slave SQL: Error 'Duplicate entry '4860038237464393'
for key 'PRIMARY'' on query. Default database: 'tel_streams'. Query: 'INSERT
INTO n75fa434a_7ced_11e1_8503_00a0d1e78d1e VALUES (4860038237464393,....

Note:
The binary logs from the mountain go into mysql-relay-bin. The tel1-bin log files are the transactions into the Slave database on tel1. See how far along the mountain database is by "show master status". The mountain is on log 9186 and Tucson is on log 9087.

(2) We need to skip over the duplicate entries, but cannot say how many need to be skipped.

121119 12:03:14 [ERROR] Slave SQL: Error 'Duplicate entry '4860038237464393' for key 'PRIMARY'' 
on query. Default database: 'tel_streams'. Query: 'INSERT INTO n75fa434a_7ced_11e1_8503_00a0d1e78d1e 
VALUES (4860038237464393, 0.013915003277361, 

(3) Stop the slave, do the skip, and restart the slave. If there are no more duplicates, then the slave will be replicating again.

mysql> STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE; 

(4) Keep skipping duplicates. Depending upon the tolerance for loss of data, the SQL_SLAVE_SKIP_COUNTER can be set to larger values. We ended up using the value of 1000.

(5) While skipping duplicates, we got a different error:

121120 11:04:35 [ERROR] Error in Log_event::read_log_event(): 'read error',
data_len: 230, event_type: 2
121120 11:04:35 [ERROR] Error reading relay log event: slave SQL thread aborted because of I/O error
121120 11:04:35 [ERROR] Slave SQL: Relay log read failure: Could not parse
relay log event entry. The possible reasons are: the master's binary log is
corrupted (you can check this by running 'mysqlbinlog' on the binary log), the
slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on
the relay log), a network problem, or a bug in the master's or slave's MySQL
code. If you want to check the master's binary log or slave's relay log, you
will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
Error_code: 1594
121120 11:04:35 [ERROR] Error running query, slave SQL thread aborted. Fix the
problem, and restart the slave SQL thread with "SLAVE START". We stopped at
log 'bin_log.009087' position 900026115

(6) To determine a solution for this log problem, see these values in the TEL1 database. Do a

mysql> show slave status\G;

and look for "Relay_Master_Log_File" and "Exec_Master_Log_Pos".

Relay_Master_Log_File: bin_log.009087
Exec_Master_Log_Pos: 900026115

(7) Stop the slave (probably already stopped), do a "change master", and start the slave:

mysql> stop slave;
mysql> change master to master_log_file='bin_log.009087', master_log_pos=900026115;
mysql> start slave;

The log file shows

121120 11:11:47 [Note] 'CHANGE MASTER TO executed'. Previous state
master_host='128.196.248.94', master_port='3306',
master_log_file='bin_log.009186', master_log_pos='690192728'. New state
master_host='128.196.248.94', master_port='3306',
master_log_file='bin_log.009087', master_log_pos='900026115'.

An alternative which definitely loses data is to set the log file to it value+1 and set the position to 0:

mysql> change master to master_log_file='bin_log.009088', master_log_pos=0;

(8) Now continue to move past duplicates.

(9) Another error is seen. These are actual database files which have been corrupted as the machine crashed while they were being written. We need to "repair" the database table in question. The table we are trying to write has a bad index.

121120 11:21:18 [ERROR] /usr/libexec/mysqld: Incorrect key file for table 
'./tel_streams/n75fa434a_7ced_11e1_8503_00a0d1e78d1e.MYI'; try to repair it
121120 11:21:18 [ERROR] /usr/libexec/mysqld: Incorrect key file for table 
'./tel_streams/n75fa434a_7ced_11e1_8503_00a0d1e78d1e.MYI'; try to repair it
121120 11:21:18 [ERROR] Slave SQL: Error 'Incorrect key file for table 
'./tel_streams/n75fa434a_7ced_11e1_8503_00a0d1e78d1e.MYI'; try to repair it' on 
query. Default database: 'tel_streams'. Query: 
'INSERT INTO n75fa434a_7ced_11e1_8503_00a0d1e78d1e VALUES (48600382

(10) On TEL1 in /tel-db there is a repair script, repair.sh. This will repair ALL the tables in the database. Since we only want to repair one table, capture the first 100 characters or so of this script (up to the write buffer size) and substitute in the troublesome table name. Be careful where you execute this command. This is being done from /tel-db and the database file is in /tel-db/mysql/tel_streams; see how this is reflected in the command line specification.

[root@tel1 tel-db]# myisamchk -r --tmpdir=. --key_buffer_size=3G --sort_buffer_size=3G 
--read_buffer_size=70M --write_buffer_size=70M 
mysql/tel_streams/n75fa434a_7ced_11e1_8503_00a0d1e78d1e.MYI

Be aware the database index fix can take a long time.

(11A) I found out from Tony that myisamchk should never be run while mysqld is running. In any case, although the "fix" completed for the table in (10A), we had another table to fix.

(11B) Now try fixing table with slave stopped and mysqld off which is the proper way to do it.

mysql> stop slave; (This stops the replication.)
mysql> quit
[root@tel1 tel-db]# service mysqld stop
[root@tel1 tel-db]# myisamchk -r -f --tmpdir=. --key_buffer_size=3G
--sort_buffer_size=3G --read_buffer_size=70M --write_buffer_size=70M
mysql/tel_streams/n75fa434a_7ced_11e1_8503_00a0d1e78d1e.MYI
- recovering (with sort) MyISAM-table
  'mysql/tel_streams/n75fa434a_7ced_11e1_8503_00a0d1e78d1e.MYI'
Data records: 2641700185
- Fixing index 1
Data records: 2641697645  <=== This number counts up as the fix is in progress.

NOTICE the -f option which has been added to the command. When we tried to run the command, the system complained about a lack of disk space. However, there were many gigabytes left, so we just forced the command to run with the "-f".

(11C) It is possible there will be several tables to fix. The error log file will tell you. In addition to the "Incorrect key file for table './tel_streams/blah'; try to repair it", you could see another error such as

121121  8:51:59 [ERROR] Slave SQL: Error 'Table './tel_streams/n552dc4e0_363d_11e0_9eab_00a0d1e349d0' is marked as crashed and
should be repaired' on query. Default database: 'tel_streams'. Query: 'INSERT
INTO n552dc4e0_363d_11e0_9eab_00a0d1e349d0 VALUES (4860038265048762,
-0.78112417459488, -0.8114001750946, -1.9436596632004, -28.85080909729,
48.020385742188, 0.042379379272461, 70.000946044922, 0, 2119, 0,
13.293608665466, 12971.379882812, 36001.01171875, -191330.25,
-2698.9504394531, -2496.9497070312)', Error_code: 145
121121  8:51:59 [Warning] Slave: Table
'./tel_streams/n552dc4e0_363d_11e0_9eab_00a0d1e349d0' is marked as crashed and
should be repaired Error_code: 145
121121  8:51:59 [Warning] Slave: Table 'n552dc4e0_363d_11e0_9eab_00a0d1e349d0'
is marked as crashed and should be repaired Error_code: 1194

(11D) So you need to fix yet another table.

[root@tel1 tel-db]# myisamchk -r -f --tmpdir=. --key_buffer_size=3G
--sort_buffer_size=3G --read_buffer_size=70M --write_buffer_size=70M
mysql/tel_streams/n552dc4e0_363d_11e0_9eab_00a0d1e349d0.MYI
- recovering (with sort) MyISAM-table
  'mysql/tel_streams/n552dc4e0_363d_11e0_9eab_00a0d1e349d0.MYI'
Data records: 150211313
- Fixing index 1
Data records: 150211338

(12A) When myisamchk is done, restart the mysql demon, and restart the slave to get replication going again. Note: you should be able to restart the demon the same way you stopped it with

   [root@tel1 tel-db]# service mysqld [start|stop]

However, I could not start mysqld in this manner, and I do not know why. Maybe Stephen can answer this question. Instead, I started mysql did like this

   [root@tel1 tel-db]# mysqld_safe --user=mysql &

which is just fine. It turns out this command also restarts the slave. If it did not, you just need to log into the mysql client and start the slave.

(12B)
[root@tel1 tel-db]# mysql -u root -p
prompt for password
mysql> start slave;
mysql> show slave status\G;

The show slave status should have "Yes" values for SLAVE_IO_RUNNING and SLAVE_SQL_RUNNING. The Read_Master_Log_File, Relay_Log_Pos, Exec_Master_Log_Pos, and Relay_Log_Space should update their values between successive executions of "show slave status\G". The Seconds_Behind_Master should be "0". If the slave has been off for some period of time which is the same as saying no replication from the master has been happening, then when the slave is restarted, the Seconds_Behind_Master will be some value > 0. These seconds are not the same as real seconds of time. They are variable, and they are some factor (depends on the transactions) different than a real second. In this way, the slave catches up from its lagging position.

Bad Record in Relay Log

1. Email daily checking slave status returned with Slave_SQL_Running: No and Seconds_Behind_Master: NULL

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 128.196.248.94
                  Master_User: tel_replicator
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: bin_log.009672
          Read_Master_Log_Pos: 567693013
               Relay_Log_File: mysql-relay-bin.001723
                Relay_Log_Pos: 765801187
        Relay_Master_Log_File: bin_log.009639
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
                        ......
                   Last_Errno: 1594
                   Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the 
master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is 
corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. 
If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 765801044
              Relay_Log_Space: 36001203921
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
                     ......
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1594
               Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the 
master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is 
corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. 
If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
1 row in set (0.00 sec)

ERROR: 
No query specified

2. The error in the /tel-db/mysql/tel1.tucson.lbto.org.err log was:

121129  4:17:03 [Warning] Aborted connection 106 to db: 'tel_metadata' user: 'tel_extractor' host: '150.135.245.18' (Got timeout reading communication packets)
121130  4:16:18 [Warning] Aborted connection 28 to db: 'tel_metadata' user: 'tel_extractor' host: '150.135.245.18' (Got timeout reading communication packets)
121201  6:45:30 [Warning] Aborted connection 122 to db: 'tel_metadata' user: 'tel_extractor' host: '150.135.245.18' (Got timeout reading communication packets)
121201 11:18:42 [ERROR] Error in Log_event::read_log_event(): 'Event too small', data_len: 0, event_type: 2
121201 11:18:42 [ERROR] Error reading relay log event: slave SQL thread aborted because of I/O error
121201 11:18:42 [ERROR] Slave SQL: Relay log read failure: Could not parse relay log event entry. The possible reasons are: 
the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log 
is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or 
slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names 
by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 1594
121201 11:18:42 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'bin_log.009639' position 765801044
121201 19:02:00 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
121201 19:02:00 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'bin_log.009650' at position 813905374
121202  5:23:02 [Warning] Aborted connection 108 to db: 'tel_metadata' user: 'tel_extractor' host: '150.135.245.18' (Got timeout reading communication packets)

3. Since the log (see Relay_Log_File in the slave status above) is huge, huge, we did what the error message said to do and ran mysqlbinlog, but we used a start-datetime and found the error at the end of the file. (could have used --start-position=765801187 as well)
We can see we have a corrupted replication file.
mysqlbinlog --start-datetime="2012-12-01 11:17:50" mysql-relay-bin.001723 
......
SET TIMESTAMP=1354385922/*!*/;
INSERT INTO n0367f7a4_586a_11df_9b0d_00a0d1e349e4 VALUES (4861102757813674, 305, 2, 1, 0, 150, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 -550, 550, -100, 334.47940063477)
/*!*/;
ERROR: Error in Log_event::read_log_event(): 'Event too small', data_len: 0, event_type: 2
ERROR: Could not read entry at offset 765801187: Error in log format or read error.
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;


4. Note the Relay_Master_Log_File and Exec_Master_Log_Pos from our status above and set the master to those values. This basically removes all relay logs from the slave (including the one that got corrupt), and starts replicating from exactly where it stopped by requesting a fresh binlog from the master.
We did not do a reset slave which is listed in one of the web pages, just prior to the change master... (this command "makes the slave forget its replication position in the master's binary log." - see http://dev.mysql.com/doc/refman/5.5/en/reset-slave.html)

[root@tel1 tel-db]# mysql -u root -p
Enter password: 
mysql> change master to master_log_file='bin_log.009639', master_log_pos=765801044;
Query OK, 0 rows affected (3.30 sec)

mysql> slave start;
Query OK, 0 rows affected (0.00 sec)

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 128.196.248.94
                  Master_User: tel_replicator
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: bin_log.009639 mysqlbinlog --start-datetime="2013-01-27 22:55" mysql-relay-bin.006171
          Read_Master_Log_Pos: 787539587
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 119271
        Relay_Master_Log_File: bin_log.009639
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
         Exec_Master_Log_Pos: 765920074
              Relay_Log_Space: 21738939
                           ................
        Seconds_Behind_Master: 78973
                           ................


28-January-2013
From the end of the .err file, you can see the complaint.  Since it's the same as before, we don't need to do more checking:

130127 22:58:39 [ERROR] Error in Log_event::read_log_event(): 'Event too small', data_len: 0, event_type: 2
130127 22:58:39 [ERROR] Error reading relay log event: slave SQL thread aborted because of I/O error
130127 22:58:39 [ERROR] Slave SQL: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 1594
130127 22:58:39 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'bin_log.011710' position 653289452


mysqlbinlog --start-datetime="2013-01-27 22:58" mysql-relay-bin.006171

# at 653289215
#130127 22:58:39 server id 1  end_log_pos 653289452     Query   thread_id=143070        exec_time=0     error_code=0
SET TIMESTAMP=1359352719/*!*/;
INSERT INTO n03f7b164_586a_11df_9b0d_00a0d1e349e4 VALUES (4866069554608481, 318, 2, 1, 1792, 150, -0.64929330348969, 258.9075012207, 259.55679321289, -0.91823935508728, -0.2121320515871, 259.68786621094, 259.89999389648, -0.30000001192093, -0.91553395986557, 0.19341887533665, -550, 550, -100, 353.53594970703)
/*!*/;
ERROR: Error in Log_event::read_log_event(): 'Event too small', data_len: 0, event_type: 2
ERROR: Could not read entry at offset 653289595: Error in log format or read error.
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;





mysql> show slave status \G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 128.196.248.94
                  Master_User: tel_replicator
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: bin_log.011725
          Read_Master_Log_Pos: 519918073
               Relay_Log_File: mysql-relay-bin.006171
                Relay_Log_Pos: 653289595
        Relay_Master_Log_File: bin_log.011710
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1594
                   Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 653289452
              Relay_Log_Space: 16626058842
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1594
               Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
1 row in set (0.00 sec)

ERROR: 
No query specified



mysql> stop slave;

mysql> change master to master_log_file='bin_log.011710', master_log_pos=653289452;

Query OK, 0 rows affected (0.03 sec)

mysql> change master to master_log_file='bin_log.011710', master_log_pos=653289452;
Query OK, 0 rows affected (2.47 sec)

mysql> show slave status \G;
*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: 128.196.248.94
                  Master_User: tel_replicator
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: bin_log.011710
          Read_Master_Log_Pos: 653289452
               Relay_Log_File: mysql-relay-bin.000001
                Relay_Log_Pos: 4
        Relay_Master_Log_File: bin_log.011710
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 653289452
              Relay_Log_Space: 106
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
1 row in set (0.00 sec)

ERROR: 
No query specified


in the .err log after:

130128  9:04:48 [Note] Slave I/O thread exiting, read up to log 'bin_log.011725', position 619950906
130128  9:05:01 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='128.196.248.94', master_port='3306', master_log_file='bin_log.011725', master_log_pos='619950906'. New state master_host='128.196.248.94', master_port='3306', master_log_file='bin_log.011710', master_log_pos='653289452'.

details on what the show slave status gives you: http://dev.mysql.com/doc/refman/5.0/en/replication-administration-status.html
Topic revision: r10 - 08 May 2014, KelleeSummers
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback