Project

General

Profile

Bug #221

Updated by Daniel Curtis about 9 years ago

{{>toc}} 

 If you have set up MySQL replication, you probably know this problem: sometimes there are invalid MySQL queries which cause the replication to not work anymore. In this short guide I explain how you can repair the replication on the MySQL slave without the need to set it up from scratch again. 

 I do not issue any guarantee that this will work for you! 

 h1. 1 Identifying The Problem 

 To find out whether replication is/isn't working and what has caused to stop it, you can take a look at the logs. On Debian, for example, MySQL logs to @/var/log/syslog@: 
 <pre> 
 grep mysql /var/log/syslog 
 </pre> 
 > May 29 09:56:08 http2 mysqld[1380]: 080529 9:56:08 [ERROR] Slave: Error 'Table 'mydb.taggregate_temp_1212047760' doesn't exist' on query. Default database: 'mydb'. Query: 'UPDATE thread AS thread,taggregate_temp_1212047760 AS aggregate 
 > May 29 09:56:08 http2 mysqld[1380]: ^ISET thread.views = thread.views + aggregate.views 
 > May 29 09:56:08 http2 mysqld[1380]: ^IWHERE thread.threadid = aggregate.threadid', Error_code: 1146 
 > May 29 09:56:08 http2 mysqld[1380]: 080529 9:56:08 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.001079' position 203015142 

 You can see what query caused the error, and at what log position the replication stopped. 
 To verify that the replication is really not working, log in to MySQL: 
 <pre> 
 mysql -u root -p 
 </pre> 

 If one of *Slave_IO_Running* or *Slave_SQL_Running* is set to *+No+*, then the +replication is broken+: 
 <pre> 
 mysql> SHOW SLAVE STATUS \G 
 </pre> 
 > *************************** 1. row *************************** 
 > Slave_IO_State: Waiting for master to send event 
 > Master_Host: 1.2.3.4 
 > Master_User: slave_user 
 > Master_Port: 3306 
 > Connect_Retry: 60 
 > Master_Log_File: mysql-bin.001079 
 > Read_Master_Log_Pos: 269214454 
 > Relay_Log_File: slave-relay.000130 
 > Relay_Log_Pos: 100125935 
 > Relay_Master_Log_File: mysql-bin.001079 
 > Slave_IO_Running: Yes 
 > Slave_SQL_Running: No 
 > Replicate_Do_DB: mydb 
 > Replicate_Ignore_DB: 
 > Replicate_Do_Table: 
 > Replicate_Ignore_Table: 
 > Replicate_Wild_Do_Table: 
 > Replicate_Wild_Ignore_Table: 
 > Last_Errno: 1146 
 > Last_Error: Error 'Table 'mydb.taggregate_temp_1212047760' doesn't exist' on query. Default database: 'mydb'.  
 > Query: 'UPDATE thread AS thread,taggregate_temp_1212047760 AS aggregate 
 > SET thread.views = thread.views + aggregate.views 
 > WHERE thread.threadid = aggregate.threadid' 
 > Skip_Counter: 0 
 > Exec_Master_Log_Pos: 203015142 
 > Relay_Log_Space: 166325247 
 > Until_Condition: None 
 > Until_Log_File: 
 > Until_Log_Pos: 0 
 > Master_SSL_Allowed: No 
 > Master_SSL_CA_File: 
 > Master_SSL_CA_Path: 
 > Master_SSL_Cert: 
 > Master_SSL_Cipher: 
 > Master_SSL_Key: 
 > Seconds_Behind_Master: NULL 
 > 1 row in set (0.00 sec) 
 
 h1. 2 - Repairing The Replication 

 Just to go sure, we stop the slave: 
 <pre> 
 mysql> STOP SLAVE; 
 </pre> 

 Fixing the problem is actually quite easy. We tell the slave to simply skip the invalid SQL query: 
 <pre> 
 mysql> SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; 
 </pre> 

 This tells the slave to skip one query (which is the invalid one that caused the replication to stop). +If you'd like to skip two queries+, you'd use SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 2; instead and so on. 

 Start the slave again: 
 <pre> 
 mysql> START SLAVE; 
 </pre> 

 Check if replication is working again: 
 <pre> 
 mysql> SHOW SLAVE STATUS \G 
 </pre> 
 > *************************** 1. row *************************** 
 > Slave_IO_State: Waiting for master to send event 
 > Master_Host: 1.2.3.4 
 > Master_User: slave_user 
 > Master_Port: 3306 
 > Connect_Retry: 60 
 > Master_Log_File: mysql-bin.001079 
 > Read_Master_Log_Pos: 447560366 
 > Relay_Log_File: slave-relay.000130 
 > Relay_Log_Pos: 225644062 
 > Relay_Master_Log_File: mysql-bin.001079 
 > Slave_IO_Running: Yes 
 > Slave_SQL_Running: Yes 
 > Replicate_Do_DB: mydb 
 > Replicate_Ignore_DB: 
 > Replicate_Do_Table: 
 > Replicate_Ignore_Table: 
 > Replicate_Wild_Do_Table: 
 > Replicate_Wild_Ignore_Table: 
 > Last_Errno: 0 
 > Last_Error: 
 > Skip_Counter: 0 
 > Exec_Master_Log_Pos: 447560366 
 > Relay_Log_Space: 225644062 
 > Until_Condition: None 
 > Until_Log_File: 
 > Until_Log_Pos: 0 
 > Master_SSL_Allowed: No 
 > Master_SSL_CA_File: 
 > Master_SSL_CA_Path: 
 > Master_SSL_Cert: 
 > Master_SSL_Cipher: 
 > Master_SSL_Key: 
 > Seconds_Behind_Master: 0 
 > 1 row in set (0.00 sec) 

 As you see, both *Slave_IO_Running* and *Slave_SQL_Running* are set to *+Yes+* now. 
 <pre> 
 mysql> quit; 
 </pre> 

 Check the log again: 
 <pre> 
 grep mysql /var/log/syslog 
 </pre> 
 > May 29 09:56:08 http2 mysqld[1380]: 080529 9:56:08 [ERROR] Slave: Error 'Table 'mydb.taggregate_temp_1212047760' doesn't exist' on query. Default database: 'mydb'. Query: 'UPDATE thread AS thread,taggregate_temp_1212047760 AS aggregate 
 > May 29 09:56:08 http2 mysqld[1380]: ^ISET thread.views = thread.views + aggregate.views 
 > May 29 09:56:08 http2 mysqld[1380]: ^IWHERE thread.threadid = aggregate.threadid', Error_code: 1146 
 > May 29 09:56:08 http2 mysqld[1380]: 080529 9:56:08 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.001079' position 203015142 
 > May 29 11:42:13 http2 mysqld[1380]: 080529 11:42:13 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.001079' at position 203015142, relay log '/var/lib/mysql/slave-relay.000130' position: 100125935 

 The last line says that replication has started again, and if you see no errors after that line, everything is ok. 

 h1. 3 - Resync a Master-Slave replication  

 This is the full step-by-step procedure to resync a master-slave replication from scratch: 

 h2. On the master: 

 <pre> 
 RESET MASTER; 
 FLUSH TABLES WITH READ LOCK; 
 SHOW MASTER STATUS; 
 </pre> 

 And +copy the values of the result of the last command+ somewhere. 

 Wihtout closing the connection to the client (because it would release the read lock) issue the command to get a dump of the master: 
 <pre> 
 mysqldump -uroot -p --all-databases > /a/path/mysqldump.sql 
 </pre> 

 Now you can release the lock, even if the dump hasn't end. To do it perform the following command in the mysql client: 
 <pre> 
 UNLOCK TABLES; 
 </pre> 

 Now copy the dump file to the slave using scp or your preferred tool. 

 h2. On the slave: 

 Open a connection to mysql and type: 
 <pre> 
 STOP SLAVE; 
 </pre> 

 Load master's data dump with this console command: 
 <pre> 
 mysql -uroot -p < mysqldump.sql 
 </pre> 

 Sync slave and master logs: 
 <pre> 
 RESET SLAVE; 
 CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=98; 
 </pre> 

 Where the values of the above fields are the ones you copied before. 

 Start the slave: 
 <pre> 
 START SLAVE; 
 </pre> 

 And verify that everything is working again, if you type 
 <pre> 
 SHOW SLAVE STATUS; 
 </pre> 
 > Slave_IO_Running: Yes 
 > Slave_SQL_Running: Yes 

 h1. Use /etc/hosts lookups 

 Rather than using DNS to provide IP lookups, I have taken advantage of puppet to manage the /etc/hosts file across each host. This was due to a comment on StackOverflow regarding MySQL lag: 
 > I have MySQL experience for 15 years and any sort of delays when connecting till now were 100% DNS related issues. Both client and server try to resolve names, possibly forward and reverse. If something is wrong in DNS setup, question goes out and never gets responded. Therefore delay until timeout occurs and next best option is taken. I would recommend write both sides IP addresses written into /etc/hosts with names to make sure DNS problems are opted out. 

 h1. Resources 

 * http://stackoverflow.com/questions/10943583/how-to-diagnose-intermittently-slow-php-to-mysql-connection-speeds-on-local-netw 
 * http://www.howtoforge.com/how-to-repair-mysql-replication 
 * http://stackoverflow.com/questions/2366018/how-to-re-sync-the-mysql-db-if-master-and-slave-have-different-database-incase-o

Back