优化 PostgreSQL:实现高效读写分离的实战技巧与最佳方案
背景
一直以来PostgreSQL数据库在scale up和scale out的方向都走得比较靠前,例如
-
单元化技术
oleg postgrespro的 PostgreSQL cluster,在分布式事务性能提升,选举算法方面的贡献非常大。
https://github.com/postgrespro/postgres_clustersim 他们的udr, bdr已经趋于成熟。
https://2ndquadrant.com/en/resources/bdr/ -
分片技术
10年前postgresql就非常成熟了,那就是skype开源的plproxy。
https://git.postgresql.org/gitweb/?p=skytools.git;a=summary
《阿里云 ApsaraDB for PostgreSQL 试用报告 - 2 教你RDS PG的水平分库》
https://yq.aliyun.com/articles/203《阿里云 ApsaraDB for PostgreSQL 试用报告 - 3 水平分库 vs 单机 性能》
https://yq.aliyun.com/articles/204《阿里云 ApsaraDB for PostgreSQL 试用报告 - 4 水平分库 之 节点扩展》
https://yq.aliyun.com/articles/205《PostgreSQL 最佳实践 - 水平分库(基于plproxy)》
https://yq.aliyun.com/articles/59372 -
NewSQL技术
这个技术PostgreSQL走得非常靠前,现在Postgres-XL, Postgres-XC, citusdb, FDW based几种技术,都在趋于成熟。
以PostgreSQL社区一贯的作风,加入到内核的功能一定是成熟的,例如fdw based sharding,已经加入到内核了。 在9.6有了非常大的进步,包括sort , where , join的下推,这些查询在满足条件时都会在分片节点执行。 -
读写分离
目前读写分离最成熟的当属pgpool-II中间件,如果用户应用程序不希望自己来选择目标节点,可以使用pgpool-II来作为中间件使用。 -
多核并行
目前唯一支持多核并行的开源数据库,在某些大查询上面使用性能提升非常明显。
用户可以使用PG作为小型的数据仓库来使用,因为PG还支持机器学习库madlib,支持plpython, plr等服务端编程语言,完全可以支撑小型的分析需求(TB级别)。《开源数据库 PostgreSQL 攻克并行计算难题》
https://yq.aliyun.com/articles/44655《PostgreSQL 并行计算 - 助力实时精准营销应用》
https://yq.aliyun.com/articles/44649《PostgreSQL 9.6 并行计算 优化器算法浅析》
https://yq.aliyun.com/articles/59180《PostgreSQL 并行计算 在 xfs, ext4 下的表现》
https://yq.aliyun.com/articles/53985 -
GPU并行
PostgreSQL开放的接口(custom scan provider),使得用户非常方便的可以扩展它的数据访问接口,GPU并行计算也是这么来实现的。pgstrom 的benchmark来看,在OLAP领域性能提升非常的明显。
使用GPU的运算能力,高带宽,提升OLAP性能。
https://wiki.postgresql.org/wiki/PGStrom
https://github.com/pg-strom/devel
但是在国内,这些技术的推广非常少,知道的人本身就不多,更不要说非常了解这些技术的人。
本文主要针对读写分离这块,讲一下如何使用pgpool-II中间件以及PG的流复制技术实现PG的读写分离。
pgpool-II 三种负载均衡模式
pgpool 的 load balance可以基于三种模式来实现.
1. replication
2. stream
3. slony-I
replication模式
这三种模式最不推荐的是replication模式 , 它是由pgpool来控制的前端数据分发的复制功能, 例如一个插入语句, 将会在所有的backend database执行, 如果backend database的返回消息不一致, 则会选择degenerate少的部分backend(少数服从多数的意思), 或者是整个事务失败(由参数replication_stop_on_mismatch, failover_if_affected_tuples_mismatch 控制).
这其实是个坑, 因为pgpool未提及差异数据的修复.
所以极度不推荐使用.
类似产品有 http://www.continuent.com/
stream模式
stream模式使用了PostgreSQL自身提供的流复制特性. 相对来说是最完善的. 所以推荐使用.
不管哪种模式, load balance都要考虑哪些SQL可以被分发到standby节点, 目前只有只读查询可以被分发到standby节点, 并且还需要考虑函数的影响, 因为有些函数是会修改数据库的, 例如nextval, 或者自定义的函数里面包含了select以外的查询.
还需要考虑SELECT在事务中的情况, 对于事务中的查询不建议在standby错开节点.
另外还有基于游标的更新也是需要注意的.
pgpool的处理意见是, 包含函数的查询都只分发给master, 除非配置了white_function_list .
简单的配置过程
1. 配置流复制环境, 一主一备, 端口分别为1999,2000.
停库
pg93@db-172-16-3-39-> pg_ctl stop -m fast
waiting for server to shut down.... done
server stopped
配置pg_hba.conf
g93@db-172-16-3-39-> cd $PGDATA
pg93@db-172-16-3-39-> ll
total 4.0K
drwx------ 16 pg93 pg93 4.0K May 13 11:16 pg_root
cd pg_root
vi pg_hba.conf
host all all 0.0.0.0/0 md5
host replication postgres 172.16.3.39/32 md5
创建备库
拷贝数据文件
pg93@db-172-16-3-39-> cp -r pg_root pg_root_2000
pg93@db-172-16-3-39-> mv pg_root pg_root_1999
pg93@db-172-16-3-39-> ll
total 8.0K
drwx------ 16 pg93 pg93 4.0K May 13 11:16 pg_root_1999
drwx------ 16 pg93 pg93 4.0K May 13 11:17 pg_root_2000
配置postgresql.conf, 修改监听端口
pg93@db-172-16-3-39-> cd pg_root_2000
pg93@db-172-16-3-39-> vi postgresql.conf
port = 2000
pg93@db-172-16-3-39-> grep "^[a-z]" postgresql.conf
listen_addresses = '0.0.0.0' # what IP address(es) to listen on;
port = 2000 # (change requires restart)
max_connections = 500 # (change requires restart)
unix_socket_directories = '.' # comma-separated list of directories
unix_socket_permissions = 0700 # begin with 0 to use octal notation
password_encryption = on
tcp_keepalives_idle = 60 # TCP_KEEPIDLE, in seconds;
tcp_keepalives_interval = 10 # TCP_KEEPINTVL, in seconds;
tcp_keepalives_count = 10 # TCP_KEEPCNT;
shared_buffers = 2048MB # min 128kB
maintenance_work_mem = 512MB # min 1MB
max_stack_depth = 8MB # min 100kB
shared_preload_libraries = 'pg_stat_statements' # (change requires restart)
vacuum_cost_delay = 10 # 0-100 milliseconds
vacuum_cost_limit = 10000 # 1-10000 credits
bgwriter_delay = 10ms # 10-10000ms between rounds
wal_level = hot_standby # minimal, archive, or hot_standby
synchronous_commit = off # synchronization level;
wal_sync_method = fdatasync # the default is the first option
wal_buffers = 16384kB # min 32kB, -1 sets based on shared_buffers
wal_writer_delay = 10ms # 1-10000 milliseconds
checkpoint_segments = 256 # in logfile segments, min 1, 16MB each
archive_mode = on # allows archiving to be done
archive_command = '/bin/date' # command to use to archive a logfile segment
max_wal_senders = 32 # max number of walsender processes
wal_keep_segments = 512 # in logfile segments, 16MB each; 0 disables
hot_standby = on # "on" allows queries during recovery
max_standby_archive_delay = 300s # max delay before canceling queries
max_standby_streaming_delay = 300s # max delay before canceling queries
wal_receiver_status_interval = 1s # send replies at least this often
hot_standby_feedback = on # send info from standby to prevent
random_page_cost = 1.5 # same scale as above
effective_cache_size = 8192MB
log_destination = 'csvlog' # Valid values are combinations of
logging_collector = on # Enable capturing of stderr and csvlog
log_directory = 'pg_log' # directory where log files are written,
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern,
log_file_mode = 0600 # creation mode for log files,
log_truncate_on_rotation = on # If on, an existing log file with the
log_rotation_age = 1d # Automatic rotation of logfiles will
log_rotation_size = 10MB # Automatic rotation of logfiles will
log_checkpoints = on
log_connections = on
log_disconnections = on
log_error_verbosity = verbose # terse, default, or verbose messages
log_lock_waits = on # log lock waits >= deadlock_timeout
log_statement = 'ddl' # none, ddl, mod, all
log_timezone = 'PRC'
track_activity_query_size = 4096 # (change requires restart)
autovacuum = on # Enable autovacuum subprocess? 'on'
log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions and
autovacuum_freeze_max_age = 1500000000 # maximum XID age before forced vacuum
datestyle = 'iso, mdy'
timezone = 'PRC'
lc_messages = 'C' # locale for system error message
lc_monetary = 'C' # locale for monetary formatting
lc_numeric = 'C' # locale for number formatting
lc_time = 'C' # locale for time formatting
default_text_search_config = 'pg_catalog.english'
deadlock_timeout = 1s
pg_stat_statements.max = 1000
pg_stat_statements.track = all
配置recovery.conf, 主的文件名为recovery.done
pg93@db-172-16-3-39-> cp $PGHOME/share/recovery.conf.sample ./
pg93@db-172-16-3-39-> mv recovery.conf.sample recovery.conf
vi recovery.conf
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=172.16.3.39 port=1999 user=postgres keepalives_idle=60'
pg93@db-172-16-3-39-> cp recovery.conf ../pg_root_1999/recovery.done
pg93@db-172-16-3-39-> vi ../pg_root_1999/recovery.done
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=172.16.3.39 port=2000 user=postgres keepalives_idle=60'
配置连接秘钥
vi ~/.pgpass
172.16.3.39:1999:*:postgres:postgres
172.16.3.39:2000:*:postgres:postgres
chmod 400 ~/.pgpass
启动主库和备库, 并确认流复制正常
pg93@db-172-16-3-39-> pg_ctl start -D /pgdata/digoal/1921/data03/pg93/pg_root_1999
pg93@db-172-16-3-39-> pg_ctl start -D /pgdata/digoal/1921/data03/pg93/pg_root_2000
pg93@db-172-16-3-39-> psql -h 127.0.0.1 -p 1999
Password:
psql (9.3.1)
Type "help" for help.
digoal=# \du
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------+-----------
digoal | Superuser | {}
postgres | Superuser, Create role, Create DB, Replication | {}
digoal=# \q
pg93@db-172-16-3-39-> psql -h 127.0.0.1 -p 2000
Password:
psql (9.3.1)
Type "help" for help.
digoal=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
(1 row)
ps -ewf|grep stream
postgres: wal sender process postgres 172.16.3.39(21679) streaming 2/E4
2. 安装pgpool
[root@db-172-16-3-150 soft_bak]# wget http://www.pgpool.net/download.php?f=pgpool-II-3.3.3.tar.gz
[root@db-172-16-3-150 soft_bak]# mv download.php\?f\=pgpool-II-3.3.3.tar.gz pgpool.tar.gz
[root@db-172-16-3-150 soft_bak]# tar -zxvf pgpool.tar.gz
[root@db-172-16-3-150 soft_bak]# cd pgpool-II-3.3.3/
[root@db-172-16-3-150 pgpool-II-3.3.3]#
这一步需要注意, 最好pgpool依赖的postgresql版本和backend database的数据库版本一致. 例如都是9.3.1
[root@db-172-16-3-150 pgpool-II-3.3.3]# ./configure --prefix=/opt/pgpool3.3.3 --with-pgsql=/home/pg93/pgsql9.3.1
[root@db-172-16-3-150 pgpool-II-3.3.3]# gmake && gmake install
3. 配置pgpool
[root@db-172-16-3-150 etc]# cd /opt/pgpool3.3.3/etc/
[root@db-172-16-3-150 etc]# cp pool_hba.conf.sample pool_hba.conf
[root@db-172-16-3-150 etc]# cp pgpool.conf.sample-stream pgpool.conf
[root@db-172-16-3-150 etc]# cp pcp.conf.sample pcp.conf
[root@db-172-16-3-150 etc]# vi pgpool.conf
listen_addresses = '0.0.0.0'
port = 9999
socket_dir = '/tmp'
pcp_port = 9898
pcp_socket_dir = '/tmp'
backend_hostname0 = '172.16.3.39'
backend_port0 = 1999
backend_weight0 = 1
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = '172.16.3.39'
backend_port1 = 2000
backend_weight1 = 1
backend_flag1 = 'ALLOW_TO_FAILOVER'
enable_pool_hba = on
pool_passwd = 'pool_passwd'
authentication_timeout = 60
ssl = off
num_init_children = 32
max_pool = 4
child_life_time = 300
child_max_connections = 0
connection_life_time = 0
client_idle_limit = 0
log_destination = 'syslog'
print_timestamp = on
log_connections = on
log_hostname = off
# 以下两个参数用于调试, 能看到sql balance的情况.
log_statement = on
log_per_node_statement = on
log_standby_delay = 'always'
syslog_facility = 'LOCAL0'
syslog_ident = 'pgpool'
debug_level = 0
pid_file_name = '/var/run/pgpool/pgpool.pid'
logdir = '/tmp'
connection_cache = on
reset_query_list = 'ABORT; DISCARD ALL'
replication_mode = off
replicate_select = off
insert_lock = off
lobj_lock_table = ''
replication_stop_on_mismatch = off
failover_if_affected_tuples_mismatch = off
load_balance_mode = on
ignore_leading_white_space = on
white_function_list = ''
black_function_list = 'currval,lastval,nextval,setval'
master_slave_mode = on
master_slave_sub_mode = 'stream'
sr_check_period = 2
sr_check_user = 'srcheck'
sr_check_password = 'SRCHECK'
delay_threshold = 1024000
follow_master_command = ''
parallel_mode = off
pgpool2_hostname = ''
health_check_period = 0
health_check_timeout = 20
health_check_user = 'nobody'
health_check_password = ''
health_check_max_retries = 0
health_check_retry_delay = 1
failover_command = ''
failback_command = ''
fail_over_on_backend_error = on
search_primary_node_timeout = 10
recovery_user = 'nobody'
recovery_password = ''
recovery_1st_stage_command = ''
recovery_2nd_stage_command = ''
recovery_timeout = 90
client_idle_limit_in_recovery = 0
use_watchdog = off
trusted_servers = ''
ping_path = '/bin'
wd_hostname = ''
wd_port = 9000
wd_authkey = ''
delegate_IP = ''
ifconfig_path = '/sbin'
if_up_cmd = 'ifconfig eth0:0 inet $_IP_$ netmask 255.255.255.0'
if_down_cmd = 'ifconfig eth0:0 down'
arping_path = '/usr/sbin' # arping command path
arping_cmd = 'arping -U $_IP_$ -w 1'
clear_memqcache_on_escalation = on
wd_escalation_command = ''
wd_lifecheck_method = 'heartbeat'
wd_interval = 10
推荐阅读
-
【2022新手指南】Java编程进阶之路 - 六、技术架构篇 ### MySQL索引底层解析与优化实战 - 你会讲解MySQL索引的数据结构吗?性能调优技巧知多少? - Redis深度揭秘:你知道多少?从基础到哨兵、主从复制全梳理 - Redis持久化及哨兵模式详解,还有集群搭建和Leader选举黑箱打开 - Zookeeper是个啥?特性和应用场景大公开 - ZooKeeper集群搭建攻略及 Leader选举、读写一致性、共享锁实现细节 - 探究ZooKeeper中的Leader选举机制及其在分布式环境中的作用 - Zab协议深入剖析:原理、功能与在Zookeeper中的核心地位 - RabbitMQ全方位解读:工作模式、消费限流、可靠投递与配置策略 - 设计者视角:RabbitMQ过期时间、死信队列与延时队列实践指南 - RocketMQ特性和应用场景揭示:理解其精髓与差异化优势 - Kafka详细介绍:特性及广泛应用于实时数据处理的场景解析 - ElasticSearch实力揭秘:特性概述与作为搜索引擎的广泛应用 - MongoDB认知升级:非关系型数据库的优势阐述,安装与使用实战教学 - BIO/NIO/AIO网络模型对比:掌握它们的区别与在网络编程中的实际应用 - Netty带你飞:理解其超快速度背后的秘密,包括线程模型分析 - 网络通信黑科技:Netty编解码原理与常用编解码器的应用,Protostuff实战演示 - 解密Netty粘包与拆包现象,怎样有效应对这一常见问题 - 自定义Netty心跳检测机制,轻松调整检测间隔时间的艺术 - Dubbo轻骑兵介绍:核心特性概览,服务降级实战与其实现益处 - Dubbo三大神器解读:本地存根与本地伪装的实战运用与优势呈现 ----------------------- 七、结语与回顾
-
优化 PostgreSQL:实现高效读写分离的实战技巧与最佳方案