Patroni 时间差导致启动状态为stop

本文详细解析了Patroni在检测PostgreSQL状态时遇到的时间差问题,介绍了如何通过修改时间差阈值或重启PostgreSQL来解决这一问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

环境信息

服务名称 | 系统版本 |
:-------- |:----- | :----- | :----- |
PostgreSQL | PostgreSQL 9.6.8
Patronictl | patronictl version 1.5.5
Etcd | etcdctl version 2.2.5

查看Patroni状态

root@localhost:~# patronictl -c /data/scripts/patroni_postgresql.yml list "9.6/main"
+----------+---------+--------------+------+---------+----+-----------+
| Cluster  |  Member |     Host     | Role |  State  | TL | Lag in MB |
+----------+---------+--------------+------+---------+----+-----------+
| 9.6/main | pgsql_1 | 192.168.1.1 |      | running |  7 |           |
| 9.6/main | pgsql_2 | 192.168.1.2 |      | stopped |    |   unknown |
+----------+---------+--------------+------+---------+----+-----------+

第二台192.168.1.2为主库,此时启动后显示状态为stopped

检查Patroni日志

tail -f /data/logs/patroni.log
# 关键日志如下
2020-04-03 14:40:35,434 INFO: Lock owner: None; I am pgsql_2
2020-04-03 14:40:35,524 INFO: PAUSE: postgres is not running
2020-04-03 14:40:45,433 INFO: Process 77888 is not postmaster, too much difference between PID file start time 1565058216.95 and process start time 1565058213
2020-04-03 14:40:45,434 INFO: Process 77888 is not postmaster, too much difference between PID file start time 1565058216.95 and process start time 1565058213
2020-04-03 14:40:45,434 WARNING: Postgresql is not running.
2020-04-03 14:40:45,434 INFO: Lock owner: None; I am pgsql_2
2020-04-03 14:40:45,523 INFO: PAUSE: postgres is not running
2020-04-03 14:40:55,435 INFO: Process 77888 is not postmaster, too much difference between PID file start time 1565058216.95 and process start time 1565058213
2020-04-03 14:40:55,435 INFO: Process 77888 is not postmaster, too much difference between PID file start time 1565058216.95 and process start time 1565058213
2020-04-03 14:40:55,435 WARNING: Postgresql is not running.

从日志可以发现进程的时间和pid文件的时间相差太多导致Patroni检测Postgresql状态为not running,实质postgresql是正常启动的, 也就是状态的判断因时间差大,判断为not running,那相差多大会被认为是not running呢?

通过python检查差值

root@localhost:~# python
Python 2.7.12 (default, Nov 12 2018, 14:36:49) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> print abs(1565058216.95 - 1565058213)                  
3.95000004768		

结果为相差为 3.9秒!!!

查看源码文件

def _is_postmaster_process(self):
        try:
            start_time = int(self._postmaster_pid.get('start_time', 0))
            if start_time and abs(self.create_time() - start_time) > 3: # 这里是关键代码!!!
                logger.info('Process %s is not postmaster, too much difference between PID file start time %s and '
                            'process start time %s', self.pid, self.create_time(), start_time)
                return False
        except ValueError:
            logger.warning('Garbage start time value in pid file: %r', self._postmaster_pid.get('start_time'))

这里已经写了判断,如果pid的创建时间与进程启动时间相差超过3秒,当abs(self.create_time() - start_time) > 3条件成立时,程序返回False,状态会是默认的stop状态,而不是running状态。
【注】这里的logger.info打印的时候,把进程时间和pid创建时间位置反了,上面打印第一个时间是进程启动时间,第二个时间才是pid文件创建时间。

对比pid文件和进程时间

# 进程启动时间:10:23:36
root@localhost:~# ps -eo pid,lstart  | grep 77888      
 77888 Tue Aug  6 10:23:36 2019

# pid 文件创建时间 10:23:33
root@localhost:~# ls --full-time | grep pid      
-rw------- 1 postgres postgres    87 2019-08-06 10:23:33.597849566 +0800 postmaster.pid	

# pid 文件创建时间 
root@localhost:~# cat postmaster.pid
77888
/data/pg9.6/main
1565058213  # 这里的时间戳转换为时间为: 2019-08-06 10:23:33
5432
/var/run/postgresql
0.0.0.0
  5432001    786435

解决办法一

通过修改postmaster.py 代码,把3改为4,然后启动patroni , 启动成功后再修改回来

if start_time and abs(self.create_time() - start_time) > 3:
# 修改为
if start_time and abs(self.create_time() - start_time) > 4:
# 再启动 patroni 
root@localhost:~# patronictl -c /data/scripts/patroni_postgresql.yml list "9.6/main"
+----------+---------+--------------+------+---------+----+-----------+
| Cluster  |  Member |     Host     | Role |  State  | TL | Lag in MB |
+----------+---------+--------------+------+---------+----+-----------+
| 9.6/main | pgsql_1 | 192.168.1.1 |       | running |  8 |    0.0    |
| 9.6/main | pgsql_2 | 192.168.1.2 | Leader| running |  8 |    0.0    |
+----------+---------+--------------+------+---------+----+-----------+

成功启动后,再上面的postmaster.py修改回来。

解决办法二

重启 postgresql 让重新生成pid 文件 ,时间一致。如果业务可以允许重启,建议使用此方法:

root@localhost:~# patronictl -c /data/scripts/patroni_postgresql.yml pause "9.6/main" --wait
root@localhost:~# /etc/init.d/postgresql stop 
root@localhost:~# /etc/init.d/postgresql start	
root@localhost:~# patronictl -c /data/scripts/patroni_postgresql.yml list "9.6/main"
+----------+---------+--------------+------+---------+----+-----------+
| Cluster  |  Member |     Host     | Role |  State  | TL | Lag in MB |
+----------+---------+--------------+------+---------+----+-----------+
| 9.6/main | pgsql_1 | 192.168.1.1 |       | running |  8 |    0.0    |
| 9.6/main | pgsql_2 | 192.168.1.2 | Leader| running |  8 |    0.0    |
+----------+---------+--------------+------+---------+----+-----------+
 Maintenance mode: on
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Ethanchen's notes

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值