1. 前言
中启乘数科技是一家专业的PostgreSQL和Greenplum数据库服务提供商,专注于数据库极致的性能,PostgreSQL 14 beta1已经发布了,PostgreSQL 14在很多方面都进行了改进,其中之一就是二进制模式的copy命令的性能提升了20%以上。
2. 测试过程
我们准备在一台物理机器上准备两套环境,一套是PostgreSQL 13.3,另一套是PostgreSQL 14beta1。
物理机器的硬件为2路服务器,CPU为:Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz,内存为256GB。
造测试表:
create table foo5 (a text, b text, c text, d text, e text);
造测试数据:
insert into foo5 select repeat('a', (random()*100)::int), 'bbb', 'cc','d', 'eee' from generate_series(1, 10000000);
copy foo5 to '/tmp/foo5.bin' binary;
清空测试表:
truncate table foo5;
把数据导入过来,看导入的时间:
copy foo5 from '/tmp/foo5.bin' binary;
在PostgreSQL 13.3上
[pg13@pg01 ~]$ psql
psql (13.3)
Type "help" for help.
postgres=# \timing
Timing is on.
postgres=# create table foo5 (a text, b text, c text, d text, e text);
CREATE TABLE
Time: 61.039 ms
postgres=# insert into foo5 select repeat('a', (random()*100)::int), 'bbb', 'cc',
postgres-# 'd', 'eee' from generate_series(1, 10000000);
INSERT 0 10000000
Time: 12415.693 ms (00:12.416)
postgres=# copy foo5 to '/tmp/foo5.bin' binary;
COPY 10000000
Time: 5239.320 ms (00:05.239)
postgres=# truncate foo5;
TRUNCATE TABLE
Time: 145.215 ms
postgres=# copy foo5 from '/tmp/foo5.bin' binary;
COPY 10000000
Time: 10817.424 ms (00:10.817)
postgres=# truncate table foo5;
TRUNCATE TABLE
Time: 154.791 ms
postgres=# copy foo5 from '/tmp/foo5.bin' binary;
COPY 10000000
Time: 10807.711 ms (00:10.808)
可以看出导入花费了10秒左右。
在PostgreSQL 14上
[pg14@pg01 ~]$ psql
psql (14beta1)
Type "help" for help.
postgres=# create table foo5 (a text, b text, c text, d text, e text);
CREATE TABLE
postgres=# insert into foo5 select repeat('a', (random()*100)::int), 'bbb', 'cc',
postgres-# 'd', 'eee' from generate_series(1, 10000000);
INSERT 0 10000000
postgres=# \timing
Timing is on.
postgres=# copy foo5 to '/tmp/foo5.bin' binary;
COPY 10000000
Time: 5179.257 ms (00:05.179)
postgres=# truncate foo5;
TRUNCATE TABLE
Time: 110.360 ms
postgres=# copy foo5 from '/tmp/foo5.bin' binary;
COPY 10000000
Time: 8160.788 ms (00:08.161)
可以看出导入的时间是8秒,节省了2秒,时间基本缩短了20%左右。
3. COPY命令使用binary的好处
主要是导出binary格式的数据,速度比较快:
[pg13@pg01 ~]$ psql
psql (13.3)
Type "help" for help.
postgres=# create table foo1(id int, t text);
CREATE TABLE
postgres=# \timing
Timing is on.
postgres=# insert into foo1 select seq, repeat('a', 5000) from generate_series(1, 2000000) as seq;
INSERT 0 2000000
Time: 44657.110 ms (00:44.657)
postgres=# copy foo1 to '/tmp/foo1.txt';
COPY 2000000
Time: 24079.960 ms (00:24.080)
postgres=# copy foo1 to '/tmp/foo1.bin' binary;
COPY 2000000
Time: 8369.372 ms (00:08.369)
可以看到,导出BINARY格式的数据的速度比导出TXT类型的速度快的多,是近3倍。
上面是在PostgreSQL 13.3中测试的,在PostgreSQL 14中测试的结果也类似,如下所示:
[pg14@pg01 ~]$ psql
psql (14beta1)
Type "help" for help.
postgres=# create table foo1(id int, t text);
CREATE TABLE
postgres=# \timing
Timing is on.
postgres=# insert into foo1 select seq, repeat('a', 5000) from generate_series(1, 2000000) as seq;
INSERT 0 2000000
Time: 44571.181 ms (00:44.571)
postgres=# \timing
Timing is on.
postgres=# copy foo1 to '/tmp/foo1.txt';
ERROR: could not open file "/tmp/foo1.txt" for writing: Permission denied
HINT: COPY TO instructs the PostgreSQL server process to write a file. You may want a client-side facility such as psql's \copy.
Time: 0.501 ms
postgres=# copy foo1 to '/tmp/foo1.txt';
COPY 2000000
Time: 23975.182 ms (00:23.975)
postgres=# copy foo1 to '/tmp/foo1.bin' binary;
COPY 2000000
Time: 8277.149 ms (00:08.277)
当然从BINARY格式的数据也更快:
[pg14@pg01 ~]$ psql
psql (14beta1)
Type "help" for help.
postgres=# \timing
postgres=# copy foo1 from '/tmp/foo1.bin' binary;
COPY 2000000
Time: 46921.412 ms (00:46.921)
postgres=# truncate table foo1;
TRUNCATE TABLE
Time: 86.614 ms
postgres=# copy foo1 from '/tmp/foo1.txt';
COPY 2000000
Time: 78118.955 ms (01:18.119)
从上面可以看到,导入TXT类型的文件,花了78秒,而导入BINARY类型的文件,花了47秒。
4. 结论
到PostgreSQL 14之后,使用COPY命令导入二进制格式COPY数据时,性能有20%以上的性能提升。
当用COPY命令中使用binary
选项把数据导入和导出时,通常会快很多,因为不需要做一些转换。今后大家可以多试试带binary
选项的COPY命令。
PostgreSQL 14在使用BINARY选项的COPY命令导入数据时,比之前的版本快的原因,是使用了buffer,减少了文件系统的fread的调用,具体可以见: