1.背景概述
在一次主從復(fù)制架構(gòu)中,由于主節(jié)點(diǎn)binlog損壞,導(dǎo)致從節(jié)點(diǎn)無法正常同步數(shù)據(jù),只能重做從節(jié)點(diǎn);因此使用MySQL 8.0.17開始提供的clone技術(shù)進(jìn)行恢復(fù),恢復(fù)后的2天都發(fā)生了主從報(bào)錯(cuò)數(shù)據(jù)沖突。
通過解析binlog發(fā)現(xiàn),同一時(shí)刻主從節(jié)點(diǎn)都在執(zhí)行同一條語句,因此詢問業(yè)務(wù)是否在主從節(jié)點(diǎn)都執(zhí)行了定時(shí)任務(wù),業(yè)務(wù)回復(fù)定時(shí)任務(wù)只在主節(jié)點(diǎn)執(zhí)行。
最后排查發(fā)現(xiàn),克隆后的從節(jié)點(diǎn)的定時(shí)任務(wù)也會(huì)是開啟的狀態(tài),因此同一時(shí)刻,主從節(jié)點(diǎn)同時(shí)執(zhí)行定時(shí)任務(wù),導(dǎo)致主從報(bào)錯(cuò),最終將從節(jié)點(diǎn)的定時(shí)任務(wù)關(guān)閉后解決此問題。
2.問題復(fù)現(xiàn)
本次測(cè)試基于 GreatSQL 8.0.32-24
?
?
greatsql>?SELECT?VERSION(); +-----------+ |?VERSION()?| +-----------+ |?8.0.32-24?| +-----------+ 1?row?in?set?(0.00?sec)
?
?
1.搭建一套主從架構(gòu)
略
2.創(chuàng)建event
?
?
greatsql>?create?database?test; greatsql>?use?test; greatsql>?CREATE?TABLE?`test`?( ?`id`?int(11)?unsigned?NOT?NULL?AUTO_INCREMENT?COMMENT?'ID', ?`now`?datetime?DEFAULT?NULL?COMMENT?'時(shí)間', ?PRIMARY?KEY?(`id`) ); greatsql>?CREATE?EVENT?event_test? ON?SCHEDULE?EVERY?1?MINUTE ON?COMPLETION?PRESERVE? ENABLE? COMMENT?'每隔1分鐘向test表插入記錄' DO?INSERT?INTO?test?VALUES(NULL,?now());
?
?
3.查看event狀態(tài)
主節(jié)點(diǎn),默認(rèn)情況下event狀態(tài)為 ENABLED
?
?
greatsql>?show?events; +------+------------+---------+-----------+-----------+------------+----------------+----------------+---------------------+------+---------+------------+----------------------+----------------------+--------------------+ |?Db??|?Name????|?Definer?|?Time?zone?|?Type????|?Execute?at?|?Interval?value?|?Interval?field?|?Starts????????|?Ends?|?Status??|?Originator?|?character_set_client?|?collation_connection?|?Database?Collation?| +------+------------+---------+-----------+-----------+------------+----------------+----------------+---------------------+------+---------+------------+----------------------+----------------------+--------------------+ |?test?|?event_test?|?root@%??|?SYSTEM???|?RECURRING?|?NULL????|?1????????|?MINUTE?????|?2023-10-12?17:11:14?|?NULL?|?ENABLED?|??????1?|?utf8mb4????????|?utf8mb4_unicode_ci??|?utf8mb4_unicode_ci?| +------+------------+---------+-----------+-----------+------------+----------------+----------------+---------------------+------+---------+------------+----------------------+----------------------+--------------------+ 1?row?in?set?(0.00?sec)
?
?
從節(jié)點(diǎn),默認(rèn)情況下event狀態(tài)為 SLAVESIDE_DISABLED
?
?
greatsql>?show?events; +------+------------+---------+-----------+-----------+------------+----------------+----------------+---------------------+------+--------------------+------------+----------------------+----------------------+--------------------+ |?Db??|?Name????|?Definer?|?Time?zone?|?Type????|?Execute?at?|?Interval?value?|?Interval?field?|?Starts????????|?Ends?|?Status???????|?Originator?|?character_set_client?|?collation_connection?|?Database?Collation?| +------+------------+---------+-----------+-----------+------------+----------------+----------------+---------------------+------+--------------------+------------+----------------------+----------------------+--------------------+ |?test?|?event_test?|?root@%??|?SYSTEM???|?RECURRING?|?NULL????|?1????????|?MINUTE?????|?2023-10-12?17:11:14?|?NULL?|?SLAVESIDE_DISABLED?|??????1?|?utf8mb4????????|?utf8mb4_unicode_ci??|?utf8mb4_unicode_ci?| +------+------------+---------+-----------+-----------+------------+----------------+----------------+---------------------+------+--------------------+------------+----------------------+----------------------+--------------------+ 1?row?in?set?(0.00?sec)
?
?
4.查看數(shù)據(jù)
?
?
greatsql>?select?*?from?test.test; +----+---------------------+ |?id?|?now?????????| +----+---------------------+ |??1?|?2023-08-08?1639?| |??2?|?2023-08-08?1639?| |??3?|?2023-08-08 1639?| +----+---------------------+ 3?rows?in?set?(0.00?sec)
?
?
5.從節(jié)點(diǎn)進(jìn)行克隆
# 安裝克隆插件,主從節(jié)點(diǎn)都需要
?
?
greatsql>?install?plugin?clone?soname?'mysql_clone.so';
?
?
# 從節(jié)點(diǎn)進(jìn)行clone
?
?
greatsql>?set?global?clone_valid_donor_list='172.17.137.162:6001'; greatsql>?clone?instance?from?root@'172.17.137.162':6001?identified?by?'greatsql';
?
?
6.重新建立主從復(fù)制
?
?
greatsql>?change?master?to?master_user='root',master_password='greatsql',master_host='172.17.137.162',master_port=6001,master_auto_position=1; Query?OK,?0?rows?affected,?7?warnings?(0.04?sec) greatsql>?start?slave; Query?OK,?0?rows?affected,?1?warning?(0.04?sec)
?
?
7.查看主從狀態(tài)
?
?
greatsql>?show?slave?statusG ***************************?1.?row?*************************** ???????????????Slave_IO_State:?Waiting?for?source?to?send?event ??????????????????Master_Host:?172.17.137.162 ??????????????????Master_User:?root ??????????????????Master_Port:?6001 ????????????????Connect_Retry:?60 ??????????????Master_Log_File:?binlog.000001 ??????????Read_Master_Log_Pos:?2959 ???????????????Relay_Log_File:?relaylog.000002 ????????????????Relay_Log_Pos:?395 ????????Relay_Master_Log_File:?binlog.000001 ?????????????Slave_IO_Running:?Yes ????????????Slave_SQL_Running:?No ??????????????Replicate_Do_DB:? ??????????Replicate_Ignore_DB:? ???????????Replicate_Do_Table:? ???????Replicate_Ignore_Table:? ??????Replicate_Wild_Do_Table:? ??Replicate_Wild_Ignore_Table:? ???????????????????Last_Errno:?1062 ???????????????????Last_Error:?Coordinator?stopped?because?there?were?error(s)?in?the?worker(s).?The?most?recent?failure?being:?Worker?1?failed?executing?transaction?'e8bf88f9-2acd-11ee-a98a-00163e605c74:8'?at?master?log?binlog.000001,?end_log_pos?2606.?See?error?log?and/or?performance_schema.replication_applier_status_by_worker?table?for?more?details?about?this?failure?or?others,?if?any. ?????????????????Skip_Counter:?0 ??????????Exec_Master_Log_Pos:?2307 ??????????????Relay_Log_Space:?1242 ??????????????Until_Condition:?None ???????????????Until_Log_File:? greatsql>?select?*?from?performance_schema.replication_applier_status_by_worker?limit?1G ***************************?1.?row?*************************** ???????????????????????????????????????????CHANNEL_NAME:? ??????????????????????????????????????????????WORKER_ID:?1 ??????????????????????????????????????????????THREAD_ID:?NULL ??????????????????????????????????????????SERVICE_STATE:?OFF ??????????????????????????????????????LAST_ERROR_NUMBER:?1062 ?????????????????????????????????????LAST_ERROR_MESSAGE:?Worker?1?failed?executing?transaction?'e8bf88f9-2acd-11ee-a98a-00163e605c74:8'?at?master?log?binlog.000001,?end_log_pos?2606;?Could?not?execute?Write_rows?event?on?table?test.test;?Duplicate?entry?'5'?for?key?'test.PRIMARY',?Error_code:?1062;?handler?error?HA_ERR_FOUND_DUPP_KEY;?the?event's?master?log?FIRST,?end_log_pos?2606 ???????????????????????????????????LAST_ERROR_TIMESTAMP:?2023-08-08?1639.033240 ???????????????????????????????LAST_APPLIED_TRANSACTION:? ?????LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP:?0000-00-00?0000.000000 ????LAST_APPLIED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP:?0000-00-00?0000.000000 ?????????LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP:?0000-00-00?0000.000000 ???????????LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP:?0000-00-00?0000.000000 ???????????????????????????????????APPLYING_TRANSACTION:?e8bf88f9-2acd-11ee-a98a-00163e605c74:8 ?????????APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP:?2023-08-08?1645.795753 ????????APPLYING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP:?2023-08-08?1645.795753 ?????????????APPLYING_TRANSACTION_START_APPLY_TIMESTAMP:?2023-08-08?1639.032510 ?????????????????LAST_APPLIED_TRANSACTION_RETRIES_COUNT:?0 ???LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER:?0 ??LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:? LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP:?0000-00-00?0000.000000 ?????????????????????APPLYING_TRANSACTION_RETRIES_COUNT:?0 ???????APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER:?0 ??????APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:? ????APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP:?0000-00-00?0000.000000 1?row?in?set?(0.00?sec)
?
?
可以看到從節(jié)點(diǎn)報(bào)錯(cuò)發(fā)生了主鍵沖突。
8.查看從節(jié)點(diǎn)定時(shí)任務(wù)狀態(tài)
當(dāng)前從節(jié)點(diǎn)定時(shí)任務(wù)狀態(tài)為 ENABLED
?
?
greatsql>?show?events; +------+------------+---------+-----------+-----------+------------+----------------+----------------+---------------------+------+---------+------------+----------------------+----------------------+--------------------+ |?Db??|?Name????|?Definer?|?Time?zone?|?Type????|?Execute?at?|?Interval?value?|?Interval?field?|?Starts????????|?Ends?|?Status??|?Originator?|?character_set_client?|?collation_connection?|?Database?Collation?| +------+------------+---------+-----------+-----------+------------+----------------+----------------+---------------------+------+---------+------------+----------------------+----------------------+--------------------+ |?test?|?event_test?|?root@%??|?SYSTEM???|?RECURRING?|?NULL????|?1????????|?MINUTE?????|?2023-08-08?15:58:45?|?NULL?|?ENABLED?|??????1?|?utf8mb4????????|?utf8mb4_unicode_ci??|?utf8mb4_unicode_ci?| +------+------------+---------+-----------+-----------+------------+----------------+----------------+---------------------+------+---------+------------+---------------------+----------------------+--------------------+ 1?row?in?set?(0.00?sec)
?
?
可以看到由于從節(jié)點(diǎn)的定時(shí)任務(wù)也執(zhí)行了,從節(jié)點(diǎn)寫入數(shù)據(jù),導(dǎo)致主鍵沖突。
9.故障解決
?
?
greatsql>?alter?event?event_test?DISABLE; Query?OK,?0?rows?affected?(0.01?sec)
?
?
關(guān)閉從節(jié)點(diǎn)的定時(shí)任務(wù)event,然后跳過主鍵沖突的報(bào)錯(cuò),最后重新啟動(dòng)主從復(fù)制。
3.總結(jié)
1.如果主庫(kù)有定時(shí)任務(wù),通過clone的方式搭建從庫(kù),在從庫(kù)恢復(fù)之后需要關(guān)閉定時(shí)任務(wù),避免主從同時(shí)執(zhí)行定時(shí)任務(wù)導(dǎo)致主從故障。
2.克隆時(shí),如果捐贈(zèng)節(jié)點(diǎn)有主從復(fù)制信息,則克隆后的接收節(jié)點(diǎn)也會(huì)克隆此復(fù)制信息,并在克隆完成自動(dòng)重啟實(shí)例后,自動(dòng)啟動(dòng)復(fù)制;避免此問題可以在接收節(jié)點(diǎn)的配置文件中增加 skip-slave-start,避免節(jié)點(diǎn)重啟后自動(dòng)啟動(dòng)復(fù)制。
編輯:黃飛
?
評(píng)論