redis哨兵笔记

redis哨兵笔记

前言

最近组内对哨兵的能力做了增加,实现了failover safe的能力,其实就是让master去做failover,哨兵不再做slave of noone等操作。感觉是一个很好的思路。

考虑到之前没有认真读过哨兵的源码,这次仔细的把全流程都读了一遍,特此做笔记。

redis版本为开源的7.0.15。

逻辑笔记

先总结一些逻辑,便于快速查找

启动时

哨兵启动时会读取配置文件,如下是一个完整的运行时配置文件,包括自动生成的部份

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
daemonize yes
pidfile "./sentinel.pid"
logfile "./sentinel.log"
loglevel notice
dir "/data/huangjiegang/running-test/sentinel/12001"
protected-mode no
maxclients 4064
port 12001
sentinel monitor server-11102 127.0.0.1 11102 2
sentinel down-after-milliseconds server-11102 20000
sentinel failover-timeout server-11102 60000
sentinel monitor server-11101 127.0.0.1 11101 2
sentinel down-after-milliseconds server-11101 20000
sentinel failover-timeout server-11101 60000

# Generated by CONFIG REWRITE
latency-tracking-info-percentiles 50 99 99.9
user default on nopass ~* &* +@all
sentinel myid ff835e7c7c614a33fe72eae2e7b37fe4638a8876
sentinel takeover-az-enabled no
sentinel takeover-az none
sentinel config-epoch server-11102 0
sentinel leader-epoch server-11102 0
sentinel config-epoch server-11101 4
sentinel leader-epoch server-11101 4
sentinel current-epoch 4

sentinel known-sentinel server-11102 127.0.0.1 12002 88f04da1e3e5828d0cbfe9574476be8b8e4da1ab

sentinel known-sentinel server-11101 127.0.0.1 12002 88f04da1e3e5828d0cbfe9574476be8b8e4da1ab

sentinel known-replica server-11102 127.0.0.1 11202

sentinel known-replica server-11101 127.0.0.1 11201

启动时主要关注sentinel monitor,这里是设置主节点。

  1. 配置文件解析完毕后,会对主节点建立命令连接和订阅连接,订阅连接即订阅__sentinel__:hello频道。
  2. 每2秒会往上述频道pubnish自己的ip:port信息,因此各个哨兵能发现彼此,并只建立命令连接。
  3. master节点的从节点信息是通过每10秒发送一次INFO命令拿到的,可以拿到该master的所有信息。
  4. 拿到从节点后,哨兵同样会建立好两个连接以及订阅相应频道。注意hello频道发布只会往主节点发,但是INFO和PING主从都会发。PING还会发哨兵节点。

故障时

  1. 每一轮cron时都会检查对应server节点是否主观下线
  2. 当发现master主观下线时,会开始定期发送is-master-down-by-addr给其他所有监听该master的哨兵,获取其他哨兵对其的主观下线的判断
  3. 一旦某一轮的一个哨兵发现了达到客观下线的数量了,则开始进入故障转移状态机,并立马发带投票的is-master-down-by-addr给其他哨兵要投票
  4. 假如自己拿到大多数后,开始继续往下走故障状态机流程。注意在故障转移完成前,is-master-down-by-addr请求是不会停的,为了让其他哨兵不重新开始投票,因为其他哨兵收到该请求后会重置自己的投票计时器。
  5. 注意其他哨兵假如也进入了故障状态机流程,会在10秒后退出该流程,并因为leader的定期is-master-down-by-addr而不会进入下一个epoch投票。他会通过定期的INFO命令获取到master的变化,最终触发switch-master频道。
  6. 主leader会依次做以下事情,选一个最好的从节点,然后对其发SLAVEOF_ONONE,然后通过INFO命令得到晋升完成的消息,最终将原主节点置为新主的从节点,并触发switch-master

#failover safe

那么当前有一个优化思路是这样的,即哨兵新命令,safe的failover,即通过redis的单实例的failover命令让master去做failover。

那么哨兵leader实际的过程会变成如下这样

  1. 选从
  2. 对主调用failover to slave_host port ... timeout force的命令
  3. 等待info的回复得到是否完成即可

这样的好处在于failover命令可以让原主节点直接接上新主,而不需要全量同步,因failover命令会暂停写操作等待同步队列完成后再主备切换。

源码

定时函数(可以认为是哨兵main函数)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
/* ======================== SENTINEL timer handler ==========================
* This is the "main" our Sentinel, being sentinel completely non blocking
* in design.
* -------------------------------------------------------------------------- */

/* Perform scheduled operations for the specified Redis instance. */
void sentinelHandleRedisInstance(sentinelRedisInstance *ri) {
/* ========== MONITORING HALF ============ */
/* Every kind of instance */
// 对于所有检测到的新Instance(包括主、从、哨兵)发起重连
sentinelReconnectInstance(ri);
// 发起周期命令,如INFO、PING、HELLO频道
sentinelSendPeriodicCommands(ri);

/* ============== ACTING HALF ============= */
...
/* Every kind of instance */
sentinelCheckSubjectivelyDown(ri);
...
/* Only masters */
if (ri->flags & SRI_MASTER) {
// 检查是否已经客观下线
sentinelCheckObjectivelyDown(ri);
// 如果触发了客观下线,开始状态机流程
if (sentinelStartFailoverIfNeeded(ri))
//立马发一个要求投票的请求
sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_ASK_FORCED);
// 状态机主流程
sentinelFailoverStateMachine(ri);
// 定期发送is-master-down-by-addr,两个目的
// 1. 获取主观下线的数量
// 2. 重置其他哨兵的投票计时器
sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_NO_FLAGS);
}
}

状态机,很清晰

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
void sentinelFailoverStateMachine(sentinelRedisInstance *ri) {
serverAssert(ri->flags & SRI_MASTER);

if (!(ri->flags & SRI_FAILOVER_IN_PROGRESS)) return;

switch(ri->failover_state) {
case SENTINEL_FAILOVER_STATE_WAIT_START:
// 这个状态,leader哨兵会直接越过,非leader哨兵会卡在这个状态约10秒后abort状态机。
sentinelFailoverWaitStart(ri);
break;
case SENTINEL_FAILOVER_STATE_SELECT_SLAVE:
// 选从
sentinelFailoverSelectSlave(ri);
break;
case SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE:
// 发slaveof onone
sentinelFailoverSendSlaveOfNoOne(ri);
break;
case SENTINEL_FAILOVER_STATE_WAIT_PROMOTION:
// 等待info命令的结果
sentinelFailoverWaitPromotion(ri);
break;
case SENTINEL_FAILOVER_STATE_RECONF_SLAVES:
// 将自己的从节点逐个移交给新主
sentinelFailoverReconfNextSlave(ri);
break;
}
}

上面函数状态走完后,会到最终一个状态SENTINEL_FAILOVER_STATE_UPDATE_CONFIG,在sentinelHandleDictOfRedisInstances函数中

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
void sentinelTimer(void) {
...
sentinelHandleDictOfRedisInstances(sentinel.masters);
...
}

void sentinelHandleDictOfRedisInstances(dict *instances) {
dictIterator *di;
dictEntry *de;
sentinelRedisInstance *switch_to_promoted = NULL;

/* There are a number of things we need to perform against every master. */
di = dictGetIterator(instances);
while((de = dictNext(di)) != NULL) {
sentinelRedisInstance *ri = dictGetVal(de);

sentinelHandleRedisInstance(ri);
if (ri->flags & SRI_MASTER) {
sentinelHandleDictOfRedisInstances(ri->slaves);
sentinelHandleDictOfRedisInstances(ri->sentinels);
if (ri->failover_state == SENTINEL_FAILOVER_STATE_UPDATE_CONFIG) {
switch_to_promoted = ri;
}
}
}
if (switch_to_promoted)
sentinelFailoverSwitchToPromotedSlave(switch_to_promoted);
dictReleaseIterator(di);
}

即最终触发sentinelFailoverSwitchToPromotedSlave()函数,触发+switch-master事件,并将自己作为从节点接到新主上

1
2
3
4
5
6
7
8
9
10
11
12
13
/* This function is called when the slave is in
* SENTINEL_FAILOVER_STATE_UPDATE_CONFIG state. In this state we need
* to remove it from the master table and add the promoted slave instead. */
void sentinelFailoverSwitchToPromotedSlave(sentinelRedisInstance *master) {
sentinelRedisInstance *ref = master->promoted_slave ?
master->promoted_slave : master;

sentinelEvent(LL_WARNING,"+switch-master",master,"%s %s %d %s %d",
master->name, announceSentinelAddr(master->addr), master->addr->port,
announceSentinelAddr(ref->addr), ref->addr->port);

sentinelResetMasterAndChangeAddress(master,ref->addr->hostname,ref->addr->port);
}

对于非leader节点的哨兵,是在leader发hello消息时得知了master的改变,触发了+switch-master。所以非leader会忽略info得到的master节点的改变,让leader用hello来通知他。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
void sentinelProcessHelloMessage(char *hello, int hello_len) {
...
/* Update master info if received configuration is newer. */
if (si && master->config_epoch < master_config_epoch) {
master->config_epoch = master_config_epoch;
if (master_port != master->addr->port ||
!sentinelAddrEqualsHostname(master->addr, token[5]))
{
sentinelAddr *old_addr;

sentinelEvent(LL_WARNING,"+config-update-from",si,"%@");
sentinelEvent(LL_WARNING,"+switch-master",
master,"%s %s %d %s %d",
master->name,
announceSentinelAddr(master->addr), master->addr->port,
token[5], master_port);

old_addr = dupSentinelAddr(master->addr);
sentinelResetMasterAndChangeAddress(master, token[5], master_port);
sentinelCallClientReconfScript(master,
SENTINEL_OBSERVER,"start",
old_addr,master->addr);
releaseSentinelAddr(old_addr);
}
}
...
}