作者介绍
唐成: 网名 osdba,《PostgreSQL修炼之道:从小工到专家》的作者,中启乘数科技公司联合创始人,从业20年,从事PostgreSQL数据库超过10年,拥有十几年数据库、操作系统、存储领域的工作经验,历任过网易研究院技术专家、阿里巴巴高级数据库专家,从事过阿里巴巴PostgreSQL、Greenplum数据库的架构设计和运维。做过数个百TB以上的Greenplum集群的维护和扩容工作,解决过很多PostgreSQL、Greenplum方面的疑难杂症。
1. 绑定变量窥视的原理说明
Oracle DBA都知道,绑定变量窥视功能是Oracle数据库的一个特性,自ORACLE9i版本开始引入,是可以通过参数数“_optim_peek_user_binds”来控制是否开启,默认是开启,即为TRUE。这就意味着,第一次以变量的方式执行某类SQL时,会生成第一个执行计划,后续执行该类SQL语句,即使变量的传入值不同,但因变量窥视的作用,依然会沿用第一次SQL语句执行时生成的执行计划,这种特性非常适用于业务表数据分布比较均匀的场景,执行计划比较稳定。但对于数据分布严重倾斜的业务表,可能会出现错误的执行计划,在极端情况下,会引发严重的性能问题。。
当”_optim_peek_user_binds”参数设置为FALSE。即,将绑定变量窥视参数特性禁用。那么已经执行过的某类值的执行计划将不会发生变化,一旦传入某个新值时,优化器会自动根据被访问对象的统计、直方图等信息,产生它认为效率最高、成本最低的执行计划。也就是说,在特性关闭的情况下,该类SQL语句可能会产生更优的执行计划。
所以为了让系统的性能不至于大起大落,在很多用户那里会关闭绑定变量窥视的功能。
那么PostgreSQL数据库在绑定变量的执行计划这一块的行为是什么呢?
PostgreSQL数据库的行为有一些复杂:
- 前5次执行时,每次都会根据实际传入的实际绑定变量新生成执行计划进行执行,即每次都是硬解析,同时会记录这5次的执行计划;
- 当第6次开始执行时,会生成一个通用的执行计划(generic plan),同时与前5次的执行计划进行比较,如果比较的结果是通用执行计划不比前5次的执行计划差,以后就会把这个通用的执行计划固定下来,这之后即使传入的值发生变化后,执行计划也不再变化。这就相当于Oracle打开了绑定变量窥视的功能。
- 当然,当第6次开始执行时,如果通用的执行计划(generic plan)比前5次的某一个执行计划差,则以后则每次都重新生成执行计划,即以后永远都是硬解析了。
从上面原理可以看出,PostgreSQL数据库能否不走硬解析,与前5次执行时传入的实际值有很大的关系,可以想象如果前5次执行时都是一个固定的值,第6次执行时的通用执行计划与前5次又一样,这时执行计划就会固定,如果以后传进来的值可以生成更好的执行计划,也不会生成了,这时可能会导致比较大的问题,这与Oracle打开了绑定变量窥视产生了一样的问题。只是因为PostgreSQL因为有先执行5次,然后第6次比较的机制,让这个问题出现的概率低了很多,但实际上还是会出现的。
当然,如果每次都是重新生成执行计划,对于高并发,会降低一些性能。实际上,对于一些重要的系统,每次重新生成执行计划,会更好一些,因为这种方式防止了系统的性能大起大落。
目前,网上很少有文章介绍这个原理,即使有也是把这个原理介绍的不清楚。
2. 实际测试
2.1 造测试表和数据
下面我们实际测试,来更深的理解这个原理。
create table test01(id serial, t text);
insert into test01(t) select 'tang' from generate_series(1, 1000000);
insert into test01(t) select 'osdba' from generate_series(1, 2);
CREATE INDEX idx_test01_t ON test01(t);
analyze test01;
上面的例子中我们创建了不均匀的数据,即为“tang”的数据是100万,而为“osdba”的数据是2条。
如果我们按常量来查,执行计划会走正确:
osdba=# explain SELECT count(*) FROM test01 WHERE t = 'tang';
QUERY PLAN
------------------------------------------------------------------------------------------
Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
-> Gather (cost=12656.01..12656.22 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
-> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
Filter: (t = 'tang'::text)
(6 rows)
Time: 1.532 ms
osdba=# explain SELECT count(*) FROM test01 WHERE t = 'osdba';
QUERY PLAN
--------------------------------------------------------------------------------------
Aggregate (cost=4.45..4.46 rows=1 width=8)
-> Index Only Scan using idx_test01_t on test01 (cost=0.42..4.44 rows=1 width=0)
Index Cond: (t = 'osdba'::text)
(3 rows)
Time: 1.484 ms
上面可以看到,当按“tang”来查是,走的是全表扫描,而按“osdba”查询时走的是索引,说明执行计划都是正确的。
2.2 按绑定变量的第一次测试
下面我们按绑定变量的方式执行:
PREPARE myplan(text) AS SELECT count(*) FROM test01 WHERE t = $1;
下面具体看:
osdba-mac:~ osdba$ psql
psql (10.5)
Type "help" for help.
osdba=# PREPARE myplan(text) AS SELECT count(*) FROM test01 WHERE t = $1;
PREPARE
osdba=# explain EXECUTE myplan('tang');
QUERY PLAN
------------------------------------------------------------------------------------------
Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
-> Gather (cost=12656.01..12656.22 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
-> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
Filter: (t = 'tang'::text)
(6 rows)
osdba=# explain EXECUTE myplan('tang');
QUERY PLAN
------------------------------------------------------------------------------------------
Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
-> Gather (cost=12656.01..12656.22 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
-> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
Filter: (t = 'tang'::text)
(6 rows)
osdba=# explain EXECUTE myplan('tang');
QUERY PLAN
------------------------------------------------------------------------------------------
Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
-> Gather (cost=12656.01..12656.22 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
-> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
Filter: (t = 'tang'::text)
(6 rows)
osdba=# explain EXECUTE myplan('tang');
QUERY PLAN
------------------------------------------------------------------------------------------
Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
-> Gather (cost=12656.01..12656.22 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
-> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
Filter: (t = 'tang'::text)
(6 rows)
osdba=# explain EXECUTE myplan('tang');
QUERY PLAN
------------------------------------------------------------------------------------------
Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
-> Gather (cost=12656.01..12656.22 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
-> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
Filter: (t = 'tang'::text)
(6 rows)
osdba=# explain EXECUTE myplan('tang');
QUERY PLAN
------------------------------------------------------------------------------------------
Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
-> Gather (cost=12656.01..12656.22 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
-> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
Filter: (t = $1)
(6 rows)
注意上面执行中的第6次的执行计划和第5次的执行计划发生了变化,前5次都是“Filter: (t = ‘tang’::text)”,而第6次变成了“Filter: (t = $1)”,这说明执行计划变成了通用执行计划,这时我们把传进去的值改成“osdba”,发现也会是走全表扫描了,不会走索引了,这时的执行计划就错了:
osdba=# explain analyze EXECUTE myplan('osdba');
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8) (actual time=114.069..114.069 rows=1 loops=1)
-> Gather (cost=12656.01..12656.22 rows=2 width=8) (actual time=113.957..114.865 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8) (actual time=106.088..106.088 rows=1 loops=3)
-> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0) (actual time=106.072..106.072 rows=1 loops=3)
Filter: (t = $1)
Rows Removed by Filter: 333333
Planning time: 0.035 ms
Execution time: 115.044 ms
(10 rows)
2.3 按绑定变量的第二次测试
前面的测试时,我们前5次执行时传进去的值都是“tang”,我们这一次让前5次中前四次传进去的值是“tang”,最后一次是“osdba”:
osdba-mac:~ osdba$ psql
psql (10.5)
Type "help" for help.
osdba=# PREPARE myplan(text) AS SELECT count(*) FROM test01 WHERE t = $1;
PREPARE
osdba=# explain EXECUTE myplan('tang');
QUERY PLAN
------------------------------------------------------------------------------------------
Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
-> Gather (cost=12656.01..12656.22 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
-> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
Filter: (t = 'tang'::text)
(6 rows)
osdba=# explain EXECUTE myplan('tang');
QUERY PLAN
------------------------------------------------------------------------------------------
Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
-> Gather (cost=12656.01..12656.22 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
-> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
Filter: (t = 'tang'::text)
(6 rows)
osdba=# explain EXECUTE myplan('tang');
QUERY PLAN
------------------------------------------------------------------------------------------
Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
-> Gather (cost=12656.01..12656.22 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
-> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
Filter: (t = 'tang'::text)
(6 rows)
osdba=# explain EXECUTE myplan('tang');
QUERY PLAN
------------------------------------------------------------------------------------------
Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
-> Gather (cost=12656.01..12656.22 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
-> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
Filter: (t = 'tang'::text)
(6 rows)
osdba=# explain EXECUTE myplan('osdba');
QUERY PLAN
--------------------------------------------------------------------------------------
Aggregate (cost=4.45..4.46 rows=1 width=8)
-> Index Only Scan using idx_test01_t on test01 (cost=0.42..4.44 rows=1 width=0)
Index Cond: (t = 'osdba'::text)
(3 rows)
后面我们无论再怎么执行固定的值,发现每次都是生成新的执行计划了:
osdba=# explain EXECUTE myplan('tang');
QUERY PLAN
------------------------------------------------------------------------------------------
Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
-> Gather (cost=12656.01..12656.22 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
-> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
Filter: (t = 'tang'::text)
(6 rows)
osdba=# explain EXECUTE myplan('tang');
QUERY PLAN
------------------------------------------------------------------------------------------
Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
-> Gather (cost=12656.01..12656.22 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
-> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
Filter: (t = 'tang'::text)
(6 rows)
...
...
...
osdba=# explain EXECUTE myplan('osdba');
QUERY PLAN
--------------------------------------------------------------------------------------
Aggregate (cost=4.45..4.46 rows=1 width=8)
-> Index Only Scan using idx_test01_t on test01 (cost=0.42..4.44 rows=1 width=0)
Index Cond: (t = 'osdba'::text)
(3 rows)
...
...
osdba=# explain EXECUTE myplan('tang');
QUERY PLAN
------------------------------------------------------------------------------------------
Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
-> Gather (cost=12656.01..12656.22 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
-> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
Filter: (t = 'tang'::text)
(6 rows)
上面的演示是前5次中前四次传进去的值是“tang”,最后一次是“osdba”,实际上只要前五次中,只要任意有1次或多次传进去的是“osdba”,不一定要求最后一次是“osdba”时,都不会走通用的执行计划,这个结果大家可以测试。
这就验证了我们前面的理论。
3. PostgreSQL 12的plan_cache_mode配置参数
在PostgreSQL11及一下的版本中,因为绑定变量窥视,虽然比Oracle出现的概率低,但还是有一定的概率导致执行计划走错。那么在PostgreSQL中是否也有类似Oracle的隐含参数把绑定变量窥视关掉的功能?答案是PostgreSQL12提供了这个功能。
在PostgreSQL 12提供了plan_cache_mode配置参数,可以取以下三个值:
- auto: 这时默认值,即默认情况下与PostgreSQL11及以下版本相同的行为。
- force_custom_plan: 相当于关闭绑定变量窥视,永远进行硬解析。
- force_generic_plan: 走通用的固定执行计划(generic plan)
所以对于一些非常重要的系统,可以把“plan_cache_mode”配置成“force_custom_plan”,避免执行计划的错误,如下所示:
osdba-mac:pgdata12 osdba$ psql
psql (12.1)
Type "help" for help.
osdba=# PREPARE myplan(text) AS SELECT count(*) FROM test01 WHERE t = $1;
PREPARE
osdba=# set plan_cache_mode to force_custom_plan;
SET
osdba=# explain EXECUTE myplan('osdba');
QUERY PLAN
--------------------------------------------------------------------------------------
Aggregate (cost=4.45..4.46 rows=1 width=8)
-> Index Only Scan using idx_test01_t on test01 (cost=0.42..4.44 rows=1 width=0)
Index Cond: (t = 'osdba'::text)
(3 rows)
osdba=# explain EXECUTE myplan('tang');
QUERY PLAN
------------------------------------------------------------------------------------------
Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
-> Gather (cost=12656.01..12656.22 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
-> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
Filter: (t = 'tang'::text)
(6 rows)