68,354
社区成员
数据库版本 | IP | 端口 | |
---|---|---|---|
主库 | 13.2 | 192.168.60.190 | 5432 |
备库 | 13.2 | 192.168.60.191 | 5433 |
postgres=# select name,setting from pg_settings where name in('synchronous_commit','synchronous_standby_names');
name | setting
---------------------------+---------
synchronous_commit | on
synchronous_standby_names | pgs1
(2 rows)
postgres=#
备库
postgres=# select name,setting from pg_settings where name in('synchronous_commit','synchronous_standby_names','primary_conninfo'); name | setting ---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------- primary_conninfo | application_name=pgs1 user=replixs password=replixs channel_binding=disable host=192.168.60.190 port=5432 sslmode=disable sslcompression=0 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any synchronous_commit | on synchronous_standby_names | (3 rows) postgres=#
停止数据库
[postgres@lyp ~]$ pg_ctl stop waiting for server to shut down.... done server stopped [postgres@lyp ~]$
添加standby.signal文件
[postgres@lyp ~]$ touch /pgsql/data/standby.signal [postgres@lyp ~]$ ll /pgsql/data/standby.signal -rw-rw-r--. 1 postgres postgres 0 Jan 10 18:55 /pgsql/data/standby.signal [postgres@lyp ~]$ chmod 600 /pgsql/data/standby.signal [postgres@lyp ~]$ ll /pgsql/data/standby.signal -rw-------. 1 postgres postgres 0 Jan 10 18:55 /pgsql/data/standby.signal [postgres@lyp ~]$
echo "primary_conninfo = 'user=replixs password=replixs channel_binding=disable host=192.168.60.191 port=5433 sslmode=disable sslcompression=0 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any'" >> /pgsql/data/postgresql.conf
[postgres@pgs1 ~]$ pg_ctl stop waiting for server to shut down........ done server stopped [postgres@pgs1 ~]$
[postgres@pgs1 data]$ pwd /pgsql/data [postgres@pgs1 data]$ rm standby.signal [postgres@pgs1 data]$ ll standbt.signal ls: cannot access standbt.signal: No such file or directory [postgres@pgs1 data]$
由于备库的primary_conninfo参数是写到postgresql.auto.conf文件中,所以只需要对postgresql.auto.conf文件进行修改即可。
[postgres@pgs1 data]$ sed -i 's/primary_conninfo/#primary_conninfo/g' /pgsql/data/postgresql.auto.conf [postgres@pgs1 data]$ more /pgsql/data/postgresql.auto.conf # Do not edit this file manually! # It will be overwritten by the ALTER SYSTEM command. #primary_conninfo = 'application_name=pgs1 user=replixs password=replixs channel_binding=disable host=192.168.60.190 port=5432 sslmode=disable sslcompression=0 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any' [postgres@pgs1 data]$
[postgres@pgs1 ~]$ pg_ctl start waiting for server to start....2022-01-10 18:36:19.022 CST [21549] LOG: redirecting log output to logging collector process 2022-01-10 18:36:19.022 CST [21549] HINT: Future log output will appear in directory "pg_log". done server started [postgres@pgs1 ~]$
[postgres@lyp data]$ cd [postgres@lyp ~]$ pg_ctl start waiting for server to start....2022-01-10 18:36:27.897 CST [114279] LOG: redirecting log output to logging collector process 2022-01-10 18:36:27.897 CST [114279] HINT: Future log output will appear in directory "pg_log". done server started [postgres@lyp ~]$
postgres=# select pg_is_in_recovery(); -[ RECORD 1 ]-----+-- pg_is_in_recovery | f postgres=# select * from pg_stat_replication ; -[ RECORD 1 ]----+------------------------------ pid | 21600 usesysid | 41760 usename | replixs application_name | walreceiver client_addr | 192.168.60.190 client_hostname | client_port | 44062 backend_start | 2022-01-10 18:39:38.857844+08 backend_xmin | state | streaming sent_lsn | 3/30000150 write_lsn | 3/30000150 flush_lsn | 3/30000150 replay_lsn | 3/30000150 write_lag | flush_lag | replay_lag | sync_priority | 0 sync_state | async reply_time | 2022-01-10 18:43:09.082354+08 postgres=#
postgres=# select pg_is_in_recovery(); -[ RECORD 1 ]-----+-- pg_is_in_recovery | t postgres=# select * from pg_stat_wal_receiver ; -[ RECORD 1 ]---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- pid | 114548 status | streaming receive_start_lsn | 3/30000000 receive_start_tli | 1 written_lsn | 3/30000150 flushed_lsn | 3/30000150 received_tli | 1 last_msg_send_time | 2022-01-10 18:43:09.084184+08 last_msg_receipt_time | 2022-01-10 18:43:09.082758+08 latest_end_lsn | 3/30000150 latest_end_time | 2022-01-10 18:39:38.859994+08 slot_name | sender_host | 192.168.60.191 sender_port | 5433 conninfo | user=replixs password=******** channel_binding=disable dbname=replication host=192.168.60.191 port=5433 fallback_application_name=walreceiver sslmode=disable sslcompression=0 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any postgres=#
关闭主库,模拟主库故障
[postgres@pgs1 ~]$ pg_ctl stop waiting for server to shut down.... done server stopped [postgres@pgs1 ~]$
postgres=# \df pg_promote List of functions -[ RECORD 1 ]-------+----------------------------------------------------------- Schema | pg_catalog Name | pg_promote Result data type | boolean Argument data types | wait boolean DEFAULT true, wait_seconds integer DEFAULT 60 Type | func postgres=#
wait: 表示是否等待备库的 promotion 完成或者 wait_seconds 秒之后返回成功,默认值为 true。
wait_seconds: 等待时间,单位秒,默认 60
postgres=# select pg_promote(true,60); pg_promote ------------ t (1 row) postgres=# \x Expanded display is on. postgres=# select * from pg_stat_wal_receiver ; (0 rows) postgres=#
postgres=# select pg_is_in_recovery(); pg_is_in_recovery ------------------- f (1 row) postgres=# \q [postgres@lyp data]$ pg_controldata pg_control version number: 1300 Catalog version number: 202007201 Database system identifier: 7018940754420329522 Database cluster state: in production pg_control last modified: Mon 10 Jan 2022 06:56:47 PM CST Latest checkpoint location: 3/32000140 Latest checkpoint's REDO location: 3/32000108 Latest checkpoint's REDO WAL file: 000000020000000300000032 Latest checkpoint's TimeLineID: 2 Latest checkpoint's PrevTimeLineID: 2 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0:3138 Latest checkpoint's NextOID: 41770 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Latest checkpoint's oldestXID: 478 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 3138 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 1 Latest checkpoint's oldestCommitTsXid:0 Latest checkpoint's newestCommitTsXid:0 Time of latest checkpoint: Mon 10 Jan 2022 06:56:47 PM CST Fake LSN counter for unlogged rels: 0/3E8 Minimum recovery ending location: 0/0 Min recovery ending loc's timeline: 0 Backup start location: 0/0 Backup end location: 0/0 End-of-backup record required: no wal_level setting: replica wal_log_hints setting: off max_connections setting: 100 max_worker_processes setting: 8 max_wal_senders setting: 10 max_prepared_xacts setting: 0 max_locks_per_xact setting: 64 track_commit_timestamp setting: off Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Size of a large-object chunk: 2048 Date/time type storage: 64-bit integers Float8 argument passing: by value Data page checksum version: 0 Mock authentication nonce: 3161aad288eeb24b3becf08e703608f791077bf17ba039a3ae396a4f20e09fd6 [postgres@lyp data]$
[postgres@pgs1 data]$ touch standby.signal [postgres@pgs1 data]$ ll standby.signal -rw-rw-r--. 1 postgres postgres 0 Jan 10 19:04 standby.signal [postgres@pgs1 data]$ chmod 660 standby.signal [postgres@pgs1 data]$ ll standby.signal -rw-rw----. 1 postgres postgres 0 Jan 10 19:04 standby.signal [postgres@pgs1 data]$
echo "primary_conninfo = 'application_name=pgs1 user=replixs password=replixs channel_binding=disable host=192.168.60.190 port=5432 sslmode=disable sslcompression=0 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any'" >> postgresql.conf
[postgres@pgs1 data]$ pg_ctl restart waiting for server to shut down.... done server stopped waiting for server to start....2022-01-10 19:05:11.247 CST [21933] LOG: redirecting log output to logging collector process 2022-01-10 19:05:11.247 CST [21933] HINT: Future log output will appear in directory "pg_log". done server started [postgres@pgs1 data]$
[postgres@pgs1 pg_log]$ psql psql (13.2) Type "help" for help. postgres=# select pg_is_in_recovery(); pg_is_in_recovery ------------------- t (1 row) postgres=# select * from pg_stat_wal_receiver ; pid | status | receive_start_lsn | receive_start_tli | written_lsn | flushed_lsn | received_tli | last_msg_send_time | last_msg_receipt_time | latest_end_lsn | latest_end_time | slot_name | sender_host | sender_port | conninfo -----+--------+-------------------+-------------------+-------------+-------------+--------------+--------------------+-----------------------+----------------+-----------------+-----------+-------------+-------------+---------- (0 rows) postgres=#
2022-01-10 19:11:31.588 CST,,,22216,,61dc1463.56c8,1,,2022-01-10 19:11:31 CST,,0,FATAL,XX000,"could not start WAL streaming: ERROR: requested starting point 3/33000000 on timeline 1 is not in this server's history DETAIL: This server's history forked from timeline 1 at 3/320000D8.",,,,,,,,,"","walreceiver" 2022-01-10 19:11:31.588 CST,,,21935,,61dc12e7.55af,82,,2022-01-10 19:05:11 CST,1/0,0,LOG,00000,"new timeline 2 forked off current database system timeline 1 before current recovery point 3/330000A0",,,,,,,,,"","startup" 2022-01-10 19:11:36.603 CST,,,22217,,61dc1468.56c9,1,,2022-01-10 19:11:36 CST,,0,FATAL,XX000,"could not start WAL streaming: ERROR: requested starting point 3/33000000 on timeline 1 is not in this server's history DETAIL: This server's history forked from timeline 1 at 3/320000D8.",,,,,,,,,"","walreceiver" 2022-01-10 19:11:36.603 CST,,,21935,,61dc12e7.55af,83,,2022-01-10 19:05:11 CST,1/0,0,LOG,00000,"new timeline 2 forked off current database system timeline 1 before current recovery point 3/330000A0",,,,,,,,,"","startup"
发生这样的错误的原因是:备用服务器在能够赶上主服务器之前已经升级,导致现在主服务器不能充当备用服务器的角色。
在使用pg_rewind时,应备份目标实例。pg_rewind会直接覆盖目标库的文件,如果rewind失败,那么可能目标库无法启动。
[postgres@pgs1 ~]$ pg_rewind --help pg_rewind resynchronizes a PostgreSQL cluster with another copy of the cluster. Usage: pg_rewind [OPTION]... Options: -c, --restore-target-wal use restore_command in target configuration to retrieve WAL files from archives -D, --target-pgdata=DIRECTORY existing data directory to modify --source-pgdata=DIRECTORY source data directory to synchronize with --source-server=CONNSTR source server to synchronize with -n, --dry-run stop before modifying anything -N, --no-sync do not wait for changes to be written safely to disk -P, --progress write progress messages -R, --write-recovery-conf write configuration for replication (requires --source-server) --debug write a lot of debug messages --no-ensure-shutdown do not automatically fix unclean shutdown -V, --version output version information, then exit -?, --help show this help, then exit Report bugs to <pgsql-bugs@lists.postgresql.org>. PostgreSQL home page: <https://www.postgresql.org/> [postgres@pgs1 ~]$
pg_rewind 前提条件:
1、full_page_writes --数据库默认为on
2、wal_log_hints 设置成 on 或者 PG 在初始化时开启 checksums 功能
[postgres@pgs1 data]$ echo 'wal_log_hints = on' >> postgresql.conf [postgres@pgs1 data]$ more postgresql.conf|grep wal_log_hints #wal_log_hints = off # also do full page writes of non-critical updates wal_log_hints = on [postgres@pgs1 data]$
[postgres@pgs1 data]$ pg_checksums -D /pgsql/data/ --enable Checksum operation completed Files scanned: 2097 Blocks scanned: 6938 pg_checksums: syncing data directory pg_checksums: updating control file Checksums enabled in cluster [postgres@pgs1 data]$
[postgres@pgs1 data]$ pg_rewind --target-pgdata=/pgsql/data --source-server='host=192.168.60.190 port=5432 user=lxs dbname=postgres password=lxs' pg_rewind: servers diverged at WAL location 3/320000D8 on timeline 1 pg_rewind: rewinding from last common checkpoint at 3/32000028 on timeline 1 pg_rewind: Done! [postgres@pgs1 data]$
[postgres@pgs1 data]$ touch standby.signal [postgres@pgs1 data]$ ll standby.signal -rw-rw-r--. 1 postgres postgres 0 Jan 10 19:04 standby.signal [postgres@pgs1 data]$ chmod 660 standby.signal [postgres@pgs1 data]$ ll standby.signal -rw-rw----. 1 postgres postgres 0 Jan 10 19:04 standby.signal [postgres@pgs1 data]$
echo "primary_conninfo = 'application_name=pgs1 user=replixs password=replixs channel_binding=disable host=192.168.60.190 port=5432 sslmode=disable sslcompression=0 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any'" >> postgresql.conf
[postgres@pgs1 ~]$ pg_ctl restart waiting for server to shut down.... done server stopped waiting for server to start....2022-01-10 19:37:35.911 CST [22780] LOG: redirecting log output to logging collector process 2022-01-10 19:37:35.911 CST [22780] HINT: Future log output will appear in directory "pg_log". done server started [postgres@pgs1 ~]$
postgres=# select pg_is_in_recovery(); pg_is_in_recovery ------------------- t (1 row) postgres=# select * from pg_stat_wal_receiver ; pid | status | receive_start_lsn | receive_start_tli | written_lsn | flushed_lsn | received_tli | last_msg_send_time | last_msg_receipt_time | latest_end_lsn | latest_end_time | slot_name | sender_host | sender_port | conninfo -----+--------+-------------------+-------------------+-------------+-------------+--------------+--------------------+-----------------------+----------------+-----------------+-----------+-------------+-------------+---------- (0 rows) postgres=#
2022-01-10 19:38:00.953 CST,,,22795,,61dc1a98.590b,1,,2022-01-10 19:38:00 CST,,0,LOG,00000,"started streaming WAL from primary at 3/37000000 on timeline 2",,,,,,,,,"","walreceiver" 2022-01-10 19:38:00.953 CST,,,22795,,61dc1a98.590b,2,,2022-01-10 19:38:00 CST,,0,FATAL,XX000,"could not receive data from WAL stream: ERROR: requested starting point 3/37000000 is ahead of the WAL flush position of this server 3/3401D930",,,,,,,,,"","walreceiver" 2022-01-10 19:38:05.957 CST,,,22800,,61dc1a9d.5910,1,,2022-01-10 19:38:05 CST,,0,LOG,00000,"started streaming WAL from primary at 3/37000000 on timeline 2",,,,,,,,,"","walreceiver" 2022-01-10 19:38:05.957 CST,,,22800,,61dc1a9d.5910,2,,2022-01-10 19:38:05 CST,,0,FATAL,XX000,"could not receive data from WAL stream: ERROR: requested starting point 3/37000000 is ahead of the WAL flush position of this server 3/3401D930",,,,,,,,,"","walreceiver" 2022-01-10 19:38:10.961 CST,,,22801,,61dc1aa2.5911,1,,2022-01-10 19:38:10 CST,,0,LOG,00000,"started streaming WAL from primary at 3/37000000 on timeline 2",,,,,,,,,"","walreceiver" 2022-01-10 19:38:10.961 CST,,,22801,,61dc1aa2.5911,2,,2022-01-10 19:38:10 CST,,0,FATAL,XX000,"could not receive data from WAL stream: ERROR: requested starting point 3/37000000 is ahead of the WAL flush position of this server 3/3401D930",,,,,,,,,"","walreceiver"
最终未能解决问题,无奈,重建备库吧。
[postgres@pgs1 data]$ pwd /pgsql/data [postgres@pgs1 data]$ ls -lrt total 92 drwx------. 2 postgres postgres 6 Jan 10 08:19 pg_twophase drwx------. 2 postgres postgres 6 Jan 10 08:19 pg_snapshots drwx------. 2 postgres postgres 6 Jan 10 08:19 pg_serial drwx------. 2 postgres postgres 6 Jan 10 08:19 pg_notify drwx------. 4 postgres postgres 36 Jan 10 08:19 pg_multixact drwx------. 2 postgres postgres 6 Jan 10 08:19 pg_dynshmem drwx------. 2 postgres postgres 6 Jan 10 08:19 pg_commit_ts drwx------. 9 postgres postgres 93 Jan 10 08:19 base drwx------. 2 postgres postgres 18 Jan 10 08:19 pg_xact -rw-------. 1 postgres postgres 3 Jan 10 08:19 PG_VERSION drwx------. 2 postgres postgres 6 Jan 10 08:19 pg_tblspc drwx------. 2 postgres postgres 6 Jan 10 08:19 pg_replslot drwx------. 5 postgres postgres 62 Jan 10 08:19 pg_walminer drwxrwxr-x. 4 postgres postgres 42 Jan 10 08:19 archive drwx------. 4 postgres postgres 68 Jan 10 19:50 pg_logical drwx------. 2 postgres postgres 6 Jan 10 19:50 pg_stat_tmp drwx------. 3 postgres postgres 4096 Jan 10 19:53 pg_wal drwx------. 2 postgres postgres 6 Jan 10 19:53 pg_subtrans drwx------. 2 postgres postgres 6 Jan 10 19:53 pg_stat drwx------. 2 postgres postgres 4096 Jan 10 19:53 global -rwxrwxr-x. 1 postgres postgres 150 Jan 10 19:53 autoarch.sh -rw-------. 1 postgres postgres 1636 Jan 10 19:53 pg_ident.conf -rw-------. 1 postgres postgres 4898 Jan 10 19:53 pg_hba.conf -rw-------. 1 postgres postgres 88 Jan 10 19:54 postgresql.auto.conf -rw-------. 1 postgres postgres 177 Jan 10 19:54 backup_label -rw-rw-r--. 1 postgres postgres 0 Jan 10 19:54 standby.signal -rw-------. 1 postgres postgres 29399 Jan 10 19:55 postgresql.conf drwx------. 2 postgres postgres 8192 Jan 10 19:55 pg_log -rw-------. 1 postgres postgres 47 Jan 10 19:55 current_logfiles -rw-------. 1 postgres postgres 70 Jan 10 19:55 postmaster.pid -rw-------. 1 postgres postgres 28 Jan 10 19:55 postmaster.opts [postgres@pgs1 data]$ rm -rf * [postgres@pgs1 data]$ ls -la total 0 drwx------. 2 postgres postgres 6 Jan 10 21:32 . drwxr-xr-x. 3 postgres postgres 18 Jan 10 08:04 .. [postgres@pgs1 data]$
[postgres@pgs1 data]$ pg_basebackup -h 192.168.60.190 -p 5432 -U replixs -l bk20220110 -Fp -Xs -D /pgsql/data/ -R -P -v Password: pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed pg_basebackup: write-ahead log start point: 3/35000028 on timeline 2 pg_basebackup: starting background WAL receiver pg_basebackup: created temporary replication slot "pg_basebackup_126611" 489410/489410 kB (100%), 1/1 tablespace pg_basebackup: write-ahead log end point: 3/35000100 pg_basebackup: waiting for background process to finish streaming ... pg_basebackup: syncing data to disk ... pg_basebackup: renaming backup_manifest.tmp to backup_manifest pg_basebackup: base backup completed [postgres@pgs1 data]$ [postgres@pgs1 data]$ pg_ctl start waiting for server to start....2022-01-10 21:33:53.095 CST [25373] LOG: redirecting log output to logging collector process 2022-01-10 21:33:53.095 CST [25373] HINT: Future log output will appear in directory "pg_log". done server started [postgres@pgs1 data]$
[postgres@pgs1 ~]$ psql psql (13.2) Type "help" for help. postgres=# select pg_is_in_recovery(); pg_is_in_recovery ------------------- t (1 row) postgres=# \x Expanded display is on. postgres=# select * from pg_stat_wal_receiver ; -[ RECORD 1 ]---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- pid | 25379 status | streaming receive_start_lsn | 3/36000000 receive_start_tli | 2 written_lsn | 3/36000060 flushed_lsn | 3/36000060 received_tli | 2 last_msg_send_time | 2022-01-10 21:34:23.272726+08 last_msg_receipt_time | 2022-01-10 21:34:23.275409+08 latest_end_lsn | 3/36000060 latest_end_time | 2022-01-10 21:33:53.192662+08 slot_name | sender_host | 192.168.60.190 sender_port | 5432 conninfo | user=replixs password=******** channel_binding=disable dbname=replication host=192.168.60.190 port=5432 fallback_application_name=walreceiver sslmode=disable sslcompression=0 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any postgres=#