redis-failover命令实现

前言

之前在哨兵笔记中提到过一种failover的优化手法,即利用redis的6.0.20引入的failover命令来做failover,更快更安全,不丢数据。

这里特此记录一下他怎么实现的。

failover命令

注意非cluster和哨兵,就是redis6.0.20后引入的failover命令,可指定slave的ip:port和超时时间,是否force等。

实现的具体过程如下

原主节点

  1. 主节点收到该命令后,切换状态机FAILOVER_WAIT_FOR_SYNC,并暂停写操作(即接收操作但不处理)
  2. 等指定的从节点的replica->repl_ack_off == server.master_repl_offset跟上后,切换状态FAILOVER_IN_PROGRESS
  3. 开始直接对该slave发起sync连接,开始部分重同步,即标准的一系列步骤,直到发送PSYNC时判断当前状态机为FAILOVER_IN_PROGRESS,增加命令参数FAILOVER
1
2
3
4
5
if (server.failover_state == FAILOVER_IN_PROGRESS) {
reply = sendCommand(conn,"PSYNC",psync_replid,psync_offset,"FAILOVER",NULL);
} else {
reply = sendCommand(conn,"PSYNC",psync_replid,psync_offset,NULL);
}
  1. 收到+CONTINUE [replid],则将原来的id放到replid2中,将返回的新id置为server.replid。最后断连自己的原先所有slave节点用于通知replid的变化。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
if (!strncmp(reply,"+CONTINUE",9)) {
/* Check the new replication ID advertised by the master. If it
* changed, we need to set the new ID as primary ID, and set
* secondary ID as the old master ID up to the current offset, so
* that our sub-slaves will be able to PSYNC with us after a
* disconnection. */
char *start = reply+10;
char *end = reply+9;
while(end[0] != '\r' && end[0] != '\n' && end[0] != '\0') end++;
if (end-start == CONFIG_RUN_ID_SIZE) {
char new[CONFIG_RUN_ID_SIZE+1];
memcpy(new,start,CONFIG_RUN_ID_SIZE);
new[CONFIG_RUN_ID_SIZE] = '\0';

if (strcmp(new,server.cached_master->replid)) {
/* Master ID changed. */
serverLog(LL_WARNING,"Master replication ID changed to %s",new);

/* Set the old ID as our ID2, up to the current offset+1. */
memcpy(server.replid2,server.cached_master->replid,
sizeof(server.replid2));
server.second_replid_offset = server.master_repl_offset+1;

/* Update the cached master ID and our own primary ID to the
* new one. */
memcpy(server.replid,new,sizeof(server.replid));
memcpy(server.cached_master->replid,new,sizeof(server.replid));

/* Disconnect all the sub-slaves: they need to be notified. */
disconnectSlaves();
}
}
...
}

原从节点

  1. 收到PSYNC [replidid] [repl_offset] FAILOVER后,开始与原主节点断连,生成新的replid,并断连自己所有的slave(让他们感知到replid变化),最后将自己置为master
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
/* Check if this is a failover request to a replica with the same replid and
* become a master if so. */
if (c->argc > 3 && !strcasecmp(c->argv[0]->ptr,"psync") &&
!strcasecmp(c->argv[3]->ptr,"failover"))
{
serverLog(LL_WARNING, "Failover request received for replid %s.",
(unsigned char *)c->argv[1]->ptr);
if (!server.masterhost) {
addReplyError(c, "PSYNC FAILOVER can't be sent to a master.");
return;
}

if (!strcasecmp(c->argv[1]->ptr,server.replid)) {
replicationUnsetMaster();
sds client = catClientInfoString(sdsempty(),c);
serverLog(LL_NOTICE,
"MASTER MODE enabled (failover request from '%s')",client);
sdsfree(client);
} else {
addReplyError(c, "PSYNC FAILOVER replid must match my replid.");
return;
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
/* Cancel replication, setting the instance as a master itself. */
void replicationUnsetMaster(void) {
if (server.masterhost == NULL) return; /* Nothing to do. */
...
/* Clear masterhost first, since the freeClient calls
* replicationHandleMasterDisconnection which can attempt to re-connect. */
sdsfree(server.masterhost);
server.masterhost = NULL;
if (server.master) freeClient(server.master);
replicationDiscardCachedMaster();
cancelReplicationHandshake(0);
/* When a slave is turned into a master, the current replication ID
* (that was inherited from the master at synchronization time) is
* used as secondary ID up to the current offset, and a new replication
* ID is created to continue with a new replication history. */
shiftReplicationId();
/* Disconnecting all the slaves is required: we need to inform slaves
* of the replication ID change (see shiftReplicationId() call). However
* the slaves will be able to partially resync with us, so it will be
* a very fast reconnection. */
disconnectSlaves();
server.repl_state = REPL_STATE_NONE;
...
}
  1. 因为该版本的redis支持psync2,因此返回+CONTINUE [replid],注意是新的replid!