J'ai une installation où j'ai 1 pgpool, 1 postgres primaires et 2 postgres de veille. Et je vois des problèmes suivants:
numéro 1: pgpool n'est pas en mesure de détecter le primaire et dit que tous sont en veille.
[root@ip-172-22-3-228 data]# Sudo -u postgres psql -h 172.21.3.41 -p 5432 -x -c "show pool_nodes;"
Password:
-[ RECORD 1 ]------------
node_id | 0
hostname | 172.21.3.229
port | 5432
status | 3
lb_weight | 0.333333
role | standby
select_cnt | 0
-[ RECORD 2 ]------------
node_id | 1
hostname | 172.21.2.88
port | 5432
status | 2
lb_weight | 0.333333
role | standby
select_cnt | 0
-[ RECORD 3 ]------------
node_id | 2
hostname | 172.22.3.228
port | 5432
status | 0
lb_weight | 0.333333
role | standby
select_cnt | 0
Voici la sortie pour pg_is_in_recovery () sur tous les nœuds où il dit correctement qui est principal "et qui est en attente":
[root@ip-172-22-3-228 data]# Sudo -u postgres psql -h 172.21.3.229 -p 5432 -x -c "select pg_is_in_recovery();"
Password:
-[ RECORD 1 ]-----+--
pg_is_in_recovery | f
[root@ip-172-22-3-228 data]# Sudo -u postgres psql -h 172.21.2.88 -p 5432 -x -c "select pg_is_in_recovery();"
Password:
-[ RECORD 1 ]-----+--
pg_is_in_recovery | t
[root@ip-172-22-3-228 data]# Sudo -u postgres psql -h 172.22.3.228 -p 5432 -x -c "select pg_is_in_recovery();"
Password:
-[ RECORD 1 ]-----+--
pg_is_in_recovery | t
numéro 2: pgpool crée une connexion persistante avec un seul de la veille avec le statut 2
Voici les journaux de pgpool:
2016-12-18 17:16:41: pid 24793: DEBUG: loading hba configuration
2016-12-18 17:16:41: pid 24793: DETAIL: loading file :"/etc/pgpool-II/pool_hba.conf" for client authentication configuration file
2016-12-18 17:16:41: pid 24793: LOG: reading status file: 0 th backend is set to down status
2016-12-18 17:16:41: pid 24793: LOG: reading status file: 2 th backend is set to down status
2016-12-18 17:16:41: pid 24793: DEBUG: pool_coninfo_size: num_init_children (20) * max_pool (10) * MAX_NUM_BACKENDS (128) * sizeof(ConnectionInfo) (136) = 3481600 bytes requested for shared memory
2016-12-18 17:16:41: pid 24793: DEBUG: ProcessInfo: num_init_children (20) * sizeof(ProcessInfo) (32) = 640 bytes requested for shared memory
2016-12-18 17:16:41: pid 24793: DEBUG: Request info are: sizeof(POOL_REQUEST_INFO) 5224 bytes requested for shared memory
2016-12-18 17:16:41: pid 24793: DEBUG: Recovery management area: sizeof(int) 4 bytes requested for shared memory
2016-12-18 17:16:41: pid 24793: LOG: Setting up socket for 0.0.0.0:5432
2016-12-18 17:16:41: pid 24793: LOG: Setting up socket for :::5432
2016-12-18 17:16:41: pid 24794: DEBUG: initializing backend status
2016-12-18 17:16:41: pid 24793: LOG: pgpool-II successfully started. version 3.5.4 (ekieboshi)
2016-12-18 17:16:41: pid 24793: LOG: find_primary_node: checking backend no 0
2016-12-18 17:16:41: pid 24793: LOG: find_primary_node: checking backend no 1
2016-12-18 17:16:41: pid 24795: DEBUG: initializing backend status
2016-12-18 17:16:41: pid 24796: DEBUG: initializing backend status
2016-12-18 17:16:41: pid 24797: DEBUG: initializing backend status
2016-12-18 17:16:41: pid 24798: DEBUG: initializing backend status
2016-12-18 17:16:41: pid 24799: DEBUG: initializing backend status
2016-12-18 17:16:41: pid 24793: DEBUG: pool_read: read 13 bytes from backend 0
2016-12-18 17:16:41: pid 24793: DEBUG: authenticate kind = 5
2016-12-18 17:16:41: pid 24793: DEBUG: pool_write: to backend: 0 kind:p
2016-12-18 17:16:41: pid 24800: DEBUG: initializing backend status
2016-12-18 17:16:41: pid 24801: DEBUG: initializing backend status
2016-12-18 17:16:41: pid 24802: DEBUG: initializing backend status
2016-12-18 17:16:41: pid 24793: DEBUG: pool_read: read 326 bytes from backend 0
2016-12-18 17:16:41: pid 24793: DEBUG: authenticate kind = 0
2016-12-18 17:16:41: pid 24793: DEBUG: authenticate backend: key data received
2016-12-18 17:16:41: pid 24793: DEBUG: authenticate backend: transaction state: I
2016-12-18 17:16:41: pid 24793: DEBUG: do_query: extended:0 query:"SELECT pg_is_in_recovery()"
2016-12-18 17:16:41: pid 24793: DEBUG: pool_write: to backend: 0 kind:Q
2016-12-18 17:16:41: pid 24803: DEBUG: initializing backend status
2016-12-18 17:16:41: pid 24793: DEBUG: pool_read: read 75 bytes from backend 0
2016-12-18 17:16:41: pid 24793: DEBUG: do_query: kind: 'T'
2016-12-18 17:16:41: pid 24793: DEBUG: do_query: received ROW DESCRIPTION ('T')
2016-12-18 17:16:41: pid 24793: DEBUG: do_query: row description: num_fileds: 1
2016-12-18 17:16:41: pid 24793: DEBUG: do_query: kind: 'D'
2016-12-18 17:16:41: pid 24793: DEBUG: do_query: received DATA ROW ('D')
2016-12-18 17:16:41: pid 24793: DEBUG: do_query: kind: 'C'
2016-12-18 17:16:41: pid 24793: DEBUG: do_query: received COMMAND COMPLETE ('C')
2016-12-18 17:16:41: pid 24793: DEBUG: do_query: kind: 'Z'
2016-12-18 17:16:41: pid 24793: DEBUG: do_query: received READY FOR QUERY ('Z')
2016-12-18 17:16:41: pid 24793: DEBUG: pool_write: to backend: 0 kind:X
2016-12-18 17:16:41: pid 24793: DEBUG: find_primary_node: 1 node is standby
2016-12-18 17:16:41: pid 24793: LOG: find_primary_node: checking backend no 2
2016-12-18 17:16:41: pid 24793: DEBUG: find_primary_node: no primary node found
2016-12-18 17:16:41: pid 24804: DEBUG: initializing backend status
2016-12-18 17:16:41: pid 24793: DEBUG: starting health check
2016-12-18 17:16:41: pid 24793: DEBUG: health check: clearing alarm
2016-12-18 17:16:41: pid 24793: DEBUG: doing health check against database:postgres user:postgres
2016-12-18 17:16:41: pid 24793: DEBUG: Backend DB node 0 status is 3
2016-12-18 17:16:41: pid 24793: DEBUG: Backend DB node 1 status is 2
2016-12-18 17:16:41: pid 24793: DEBUG: Trying to make persistent DB connection to backend node 1 having status 2
2016-12-18 17:16:41: pid 24805: DEBUG: initializing backend status
2016-12-18 17:16:41: pid 24806: DEBUG: initializing backend status
2016-12-18 17:16:41: pid 24807: DEBUG: initializing backend status
2016-12-18 17:16:41: pid 24808: DEBUG: initializing backend status
2016-12-18 17:16:41: pid 24809: DEBUG: initializing backend status
2016-12-18 17:16:41: pid 24810: DEBUG: initializing backend status
2016-12-18 17:16:41: pid 24793: DEBUG: pool_read: read 13 bytes from backend 0
2016-12-18 17:16:41: pid 24793: DEBUG: authenticate kind = 5
2016-12-18 17:16:41: pid 24793: DEBUG: pool_write: to backend: 0 kind:p
2016-12-18 17:16:41: pid 24811: DEBUG: initializing backend status
2016-12-18 17:16:41: pid 24812: DEBUG: initializing backend status
2016-12-18 17:16:41: pid 24793: DEBUG: pool_read: read 318 bytes from backend 0
2016-12-18 17:16:41: pid 24793: DEBUG: authenticate kind = 0
2016-12-18 17:16:41: pid 24793: DEBUG: authenticate backend: key data received
2016-12-18 17:16:41: pid 24793: DEBUG: authenticate backend: transaction state: I
2016-12-18 17:16:41: pid 24793: DEBUG: persistent DB connection to backend node 1 having status 2 is successful
2016-12-18 17:16:41: pid 24793: DEBUG: pool_write: to backend: 0 kind:X
2016-12-18 17:16:41: pid 24793: DEBUG: Backend DB node 2 status is 3
Est-ce que quelqu'un a une idée de ce qui pourrait être faux ici? Voici la configuration PGPOOL:
# ----------------------------
# pgPool-II configuration file
# ----------------------------
#------------------------------------------------------------------------------
# CONNECTIONS
#------------------------------------------------------------------------------
# - pgpool Connection Settings -
listen_addresses = '*'
# Host name or IP address to listen on:
# '*' for all, '' for no TCP/IP connections
# (change requires restart)
port = 5432
# Port number
# (change requires restart)
socket_dir = '/var/run/postgresql'
# Unix domain socket path
# The Debian package defaults to
# /var/run/postgresql
# (change requires restart)
listen_backlog_multiplier = 2
# Set the backlog parameter of listen(2) to
# num_init_children * listen_backlog_multiplier.
# (change requires restart)
serialize_accept = on
# whether to serialize accept() call to avoid thundering herd problem
# (change requires restart)
# - pgpool Communication Manager Connection Settings -
pcp_listen_addresses = '*'
# Host name or IP address for pcp process to listen on:
# '*' for all, '' for no TCP/IP connections
# (change requires restart)
pcp_port = 9898
# Port number for pcp
# (change requires restart)
pcp_socket_dir = '/var/run/postgresql'
# Unix domain socket path for pcp
# The Debian package defaults to
# /var/run/postgresql
# (change requires restart)
# - Backend Connection Settings -
backend_hostname0 = '172.21.3.229'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/pgsql/9.6/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = '172.21.2.88'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/pgsql/9.6/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_hostname2 = '172.22.3.228'
backend_port2 = 5432
backend_weight2 = 1
backend_data_directory2 = '/var/lib/pgsql/9.6/data'
backend_flag2 = 'ALLOW_TO_FAILOVER'
# - Authentication -
enable_pool_hba = on
# Use pool_hba.conf for client authentication
pool_passwd = 'pool_passwd'
# File name of pool_passwd for md5 authentication.
# "" disables pool_passwd.
# (change requires restart)
authentication_timeout = 60
# Delay in seconds to complete client authentication
# 0 means no timeout.
# - SSL Connections -
ssl = off
# Enable SSL support
# (change requires restart)
#ssl_key = './server.key'
# Path to the SSL private key file
# (change requires restart)
#ssl_cert = './server.cert'
# Path to the SSL public certificate file
# (change requires restart)
#ssl_ca_cert = ''
# Path to a single PEM format file
# containing CA root certificate(s)
# (change requires restart)
#ssl_ca_cert_dir = ''
# Directory containing CA root certificate(s)
# (change requires restart)
#------------------------------------------------------------------------------
# POOLS
#------------------------------------------------------------------------------
# - Concurrent session and pool size -
num_init_children = 20
# Number of concurrent sessions allowed
# (change requires restart)
max_pool = 10
# Number of connection pool caches per connection
# (change requires restart)
# - Life time -
child_life_time = 300
# Pool exits after being idle for this many seconds
child_max_connections = 0
# Pool exits after receiving that many connections
# 0 means no exit
connection_life_time = 0
# Connection to backend closes after being idle for this many seconds
# 0 means no close
client_idle_limit = 0
# Client is disconnected after being idle for that many seconds
# (even inside an explicit transactions!)
# 0 means no disconnection
#------------------------------------------------------------------------------
# LOGS
#------------------------------------------------------------------------------
# - Where to log -
log_destination = 'stderr,syslog'
# Where to log
# Valid values are combinations of stderr,
# and syslog. Default to stderr.
# - What to log -
print_timestamp = on # Print timestamp on each line
# (change requires restart)
log_connections = on
# Log connections
log_hostname = on
# Hostname will be shown in ps status
# and in logs if connections are logged
log_statement = on
# Log all statements
log_per_node_statement = on
# Log all statements
# with node and backend informations
log_standby_delay = 'none'
# Log standby delay
# Valid values are combinations of always,
# if_over_threshold, none
# - Syslog specific -
syslog_facility = 'LOCAL0'
# Syslog local facility. Default to LOCAL0
syslog_ident = 'pgpool'
# Syslog program identification string
# Default to 'pgpool'
# - Debug -
debug_level = 1
# Debug message verbosity level
# 0 means no message, 1 or more mean verbose
#log_error_verbosity = default # terse, default, or verbose messages
#client_min_messages = notice # values in order of decreasing detail:
# debug5
# debug4
# debug3
# debug2
# debug1
# log
# notice
# warning
# error
#log_min_messages = warning # values in order of decreasing detail:
# debug5
# debug4
# debug3
# debug2
# debug1
# info
# notice
# warning
# error
# log
# fatal
# panic
#------------------------------------------------------------------------------
# FILE LOCATIONS
#------------------------------------------------------------------------------
pid_file_name = '/var/run/pgpool/pgpool.pid'
# PID file name
# (change requires restart)
logdir = '/var/log/pgpool'
# Directory of pgPool status file
# (change requires restart)
#------------------------------------------------------------------------------
# CONNECTION POOLING
#------------------------------------------------------------------------------
connection_cache = on
# Activate connection pools
# (change requires restart)
# Semicolon separated list of queries
# to be issued at the end of a session
# The default is for 8.3 and later
reset_query_list = 'ABORT; DISCARD ALL'
# The following one is for 8.2 and before
#reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT'
#------------------------------------------------------------------------------
# REPLICATION MODE
#------------------------------------------------------------------------------
replication_mode = off
# Activate replication mode
# (change requires restart)
replicate_select = off
# Replicate SELECT statements
# when in replication mode
# replicate_select is higher priority than
# load_balance_mode.
insert_lock = on
# Automatically locks a dummy row or a table
# with INSERT statements to keep SERIAL data
# consistency
# Without SERIAL, no lock will be issued
lobj_lock_table = ''
# When rewriting lo_creat command in
# replication mode, specify table name to
# lock
# - Degenerate handling -
replication_stop_on_mismatch = off
# On disagreement with the packet kind
# sent from backend, degenerate the node
# which is most likely "minority"
# If off, just force to exit this session
failover_if_affected_tuples_mismatch = off
# On disagreement with the number of affected
# tuples in UPDATE/DELETE queries, then
# degenerate the node which is most likely
# "minority".
# If off, just abort the transaction to
# keep the consistency
#------------------------------------------------------------------------------
# LOAD BALANCING MODE
#------------------------------------------------------------------------------
load_balance_mode = off
# Activate load balancing mode
# (change requires restart)
ignore_leading_white_space = on
# Ignore leading white spaces of each query
white_function_list = ''
# Comma separated list of function names
# that don't write to database
# Regexp are accepted
black_function_list = 'nextval,setval'
# Comma separated list of function names
# that write to database
# Regexp are accepted
database_redirect_preference_list = ''
# comma separated list of pairs of database and node id.
# example: postgres:primary,mydb[0-4]:1,mydb[5-9]:2'
# valid for streaming replicaton mode only.
app_name_redirect_preference_list = ''
# comma separated list of pairs of app name and node id.
# example: 'psql:primary,myapp[0-4]:1,myapp[5-9]:standby'
# valid for streaming replicaton mode only.
allow_sql_comments = off
# if on, ignore SQL comments when judging if load balance or
# query cache is possible.
# If off, SQL comments effectively prevent the judgment
# (pre 3.4 behavior).
#------------------------------------------------------------------------------
# MASTER/SLAVE MODE
#------------------------------------------------------------------------------
master_slave_mode = on
# Activate master/slave mode
# (change requires restart)
master_slave_sub_mode = 'stream'
# Master/slave sub mode
# Valid values are combinations slony or
# stream. Default is slony.
# (change requires restart)
# - Streaming -
sr_check_period = 10
# Streaming replication check period
# Disabled (0) by default
sr_check_user = 'replication_user'
# Streaming replication check user
# This is necessary even if you disable
# streaming replication delay check with
# sr_check_period = 0
sr_check_password = 'replication_pass'
# Password for streaming replication check user
sr_check_database = 'replication_db'
# Database name for streaming replication check
delay_threshold = 0
# Threshold before not dispatching query to standby node
# Unit is in bytes
# Disabled (0) by default
# - Special commands -
follow_master_command = ''
# Executes this command after master failover
# Special values:
# %d = node id
# %h = Host name
# %p = port number
# %D = database cluster path
# %m = new master node id
# %H = hostname of the new master node
# %M = old master node id
# %P = old primary node id
# %r = new master port number
# %R = new master database cluster path
# %% = '%' character
#------------------------------------------------------------------------------
# HEALTH CHECK
#------------------------------------------------------------------------------
health_check_period = 5
# Health check period
# Disabled (0) by default
health_check_timeout = 20
# Health check timeout
# 0 means no timeout
health_check_user = 'postgres'
# Health check user
health_check_password = 'postgres'
# Password for health check user
health_check_database = 'postgres'
# Database name for health check. If '', tries 'postgres' frist, then 'template1'
health_check_max_retries = 2
# Maximum number of times to retry a failed health check before giving up.
health_check_retry_delay = 1
# Amount of time to wait (in seconds) between retries.
connect_timeout = 10000
# Timeout value in milliseconds before giving up to connect to backend.
# Default is 10000 ms (10 second). Flaky network user may want to increase
# the value. 0 means no timeout.
# Note that this value is not only used for health check,
# but also for ordinary conection to backend.
#------------------------------------------------------------------------------
# FAILOVER AND FAILBACK
#------------------------------------------------------------------------------
failover_command = ''
# Executes this command at failover
# Special values:
# %d = node id
# %h = Host name
# %p = port number
# %D = database cluster path
# %m = new master node id
# %H = hostname of the new master node
# %M = old master node id
# %P = old primary node id
# %r = new master port number
# %R = new master database cluster path
# %% = '%' character
failback_command = ''
# Executes this command at failback.
# Special values:
# %d = node id
# %h = Host name
# %p = port number
# %D = database cluster path
# %m = new master node id
# %H = hostname of the new master node
# %M = old master node id
# %P = old primary node id
# %r = new master port number
# %R = new master database cluster path
# %% = '%' character
fail_over_on_backend_error = on
# Initiates failover when reading/writing to the
# backend communication socket fails
# If set to off, pgpool will report an
# error and disconnect the session.
#search_primary_node_timeout = 10
# Timeout in seconds to search for the
# primary node when a failover occurs.
# 0 means no timeout, keep searching
# for a primary node forever.
#------------------------------------------------------------------------------
# ONLINE RECOVERY
#------------------------------------------------------------------------------
recovery_user = 'postgres'
# Online recovery user
recovery_password = 'postgres'
# Online recovery password
recovery_1st_stage_command = ''
# Executes a command in first stage
recovery_2nd_stage_command = ''
# Executes a command in second stage
recovery_timeout = 90
# Timeout in seconds to wait for the
# recovering node's postmaster to start up
# 0 means no wait
client_idle_limit_in_recovery = 0
# Client is disconnected after being idle
# for that many seconds in the second stage
# of online recovery
# 0 means no disconnection
# -1 means immediate disconnection
#------------------------------------------------------------------------------
# WATCHDOG
#------------------------------------------------------------------------------
# - Enabling -
use_watchdog = off
# Activates watchdog
# (change requires restart)
# -Connection to up stream servers -
trusted_servers = ''
# trusted server list which are used
# to confirm network connection
# (hostA,hostB,hostC,...)
# (change requires restart)
ping_path = '/bin'
# ping command path
# (change requires restart)
# - Watchdog communication Settings -
wd_hostname = ''
# Host name or IP address of this watchdog
# (change requires restart)
wd_port = 9000
# port number for watchdog service
# (change requires restart)
wd_priority = 1
# priority of this watchdog in leader election
# (change requires restart)
wd_authkey = ''
# Authentication key for watchdog communication
# (change requires restart)
wd_ipc_socket_dir = '/var/run/postgresql'
# Unix domain socket path for watchdog IPC socket
# The Debian package defaults to
# /var/run/postgresql
# (change requires restart)
# - Virtual IP control Setting -
delegate_IP = ''
# delegate IP address
# If this is empty, virtual IP never bring up.
# (change requires restart)
if_cmd_path = '/sbin'
# path to the directory where if_up/down_cmd exists
# (change requires restart)
if_up_cmd = 'ip addr add $_IP_$/24 dev eth0 label eth0:0'
# startup delegate IP command
# (change requires restart)
if_down_cmd = 'ip addr del $_IP_$/24 dev eth0'
# shutdown delegate IP command
# (change requires restart)
arping_path = '/usr/sbin'
# arping command path
# (change requires restart)
arping_cmd = 'arping -U $_IP_$ -w 1'
# arping command
# (change requires restart)
# - Behaivor on escalation Setting -
clear_memqcache_on_escalation = on
# Clear all the query cache on shared memory
# when standby pgpool escalate to active pgpool
# (= virtual IP holder).
# This should be off if client connects to pgpool
# not using virtual IP.
# (change requires restart)
wd_escalation_command = ''
# Executes this command at escalation on new active pgpool.
# (change requires restart)
wd_de_escalation_command = ''
# Executes this command when master pgpool resigns from being master.
# (change requires restart)
# - Lifecheck Setting -
# -- common --
wd_monitoring_interfaces_list = '' # Comma separated list of interfaces names to monitor.
# if any interface from the list is active the watchdog will
# consider the network is fine
# 'any' to enable monitoring on all interfaces except loopback
# '' to disable monitoring
wd_lifecheck_method = 'heartbeat'
# Method of watchdog lifecheck ('heartbeat' or 'query' or 'external')
# (change requires restart)
wd_interval = 10
# lifecheck interval (sec) > 0
# (change requires restart)
# -- heartbeat mode --
wd_heartbeat_port = 9694
# Port number for receiving heartbeat signal
# (change requires restart)
wd_heartbeat_keepalive = 2
# Interval time of sending heartbeat signal (sec)
# (change requires restart)
wd_heartbeat_deadtime = 30
# Deadtime interval for heartbeat signal (sec)
# (change requires restart)
heartbeat_destination0 = 'Host0_ip1'
# Host name or IP address of destination 0
# for sending heartbeat signal.
# (change requires restart)
heartbeat_destination_port0 = 9694
# Port number of destination 0 for sending
# heartbeat signal. Usually this is the
# same as wd_heartbeat_port.
# (change requires restart)
heartbeat_device0 = ''
# Name of NIC device (such like 'eth0')
# used for sending/receiving heartbeat
# signal to/from destination 0.
# This works only when this is not empty
# and pgpool has root privilege.
# (change requires restart)
#heartbeat_destination1 = 'Host0_ip2'
#heartbeat_destination_port1 = 9694
#heartbeat_device1 = ''
# -- query mode --
wd_life_point = 3
# lifecheck retry times
# (change requires restart)
wd_lifecheck_query = 'SELECT 1'
# lifecheck query to pgpool from watchdog
# (change requires restart)
wd_lifecheck_dbname = 'template1'
# Database name connected for lifecheck
# (change requires restart)
wd_lifecheck_user = 'postgres'
# watchdog user monitoring pgpools in lifecheck
# (change requires restart)
wd_lifecheck_password = 'postgres'
# Password for watchdog user in lifecheck
# (change requires restart)
# - Other pgpool Connection Settings -
#other_pgpool_hostname0 = 'Host0'
# Host name or IP address to connect to for other pgpool 0
# (change requires restart)
#other_pgpool_port0 = 5432
# Port number for othet pgpool 0
# (change requires restart)
#other_wd_port0 = 9000
# Port number for othet watchdog 0
# (change requires restart)
#other_pgpool_hostname1 = 'Host1'
#other_pgpool_port1 = 5432
#other_wd_port1 = 9000
J'ai une résolution pour cela ici: http://www.pgpool.net/mantisbt/view.php?id=274