1. 前言
中启乘数科技是一家专业的PostgreSQL和Greenplum数据库服务提供商,专注于数据库极致的性能,PostgreSQL 14 beta1已经发布了,PostgreSQL 14在很多方面都进行了改进,其中之一就是二进制模式的copy命令的性能提升了20%以上。
2. 测试过程
我们准备在一台物理机器上准备两套环境,一套是PostgreSQL 13.3,另一套是PostgreSQL 14beta1。
物理机器的硬件为2路服务器,CPU为:Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz,内存为256GB。
造测试表:
create table foo5 (a text, b text, c text, d text, e text);
造测试数据:
insert into foo5 select repeat('a', (random()*100)::int), 'bbb', 'cc','d', 'eee' from generate_series(1, 10000000);copy foo5 to '/tmp/foo5.bin' binary;
清空测试表:
truncate table foo5;
把数据导入过来,看导入的时间:
copy foo5 from '/tmp/foo5.bin' binary;
在PostgreSQL 13.3上
[pg13@pg01 ~]$ psqlpsql (13.3)Type "help" for help.postgres=# \timingTiming is on.postgres=# create table foo5 (a text, b text, c text, d text, e text);CREATE TABLETime: 61.039 mspostgres=# insert into foo5 select repeat('a', (random()*100)::int), 'bbb', 'cc',postgres-# 'd', 'eee' from generate_series(1, 10000000);INSERT 0 10000000Time: 12415.693 ms (00:12.416)postgres=# copy foo5 to '/tmp/foo5.bin' binary;COPY 10000000Time: 5239.320 ms (00:05.239)postgres=# truncate foo5;TRUNCATE TABLETime: 145.215 mspostgres=# copy foo5 from '/tmp/foo5.bin' binary;COPY 10000000Time: 10817.424 ms (00:10.817)postgres=# truncate table foo5;TRUNCATE TABLETime: 154.791 mspostgres=# copy foo5 from '/tmp/foo5.bin' binary;COPY 10000000Time: 10807.711 ms (00:10.808)
可以看出导入花费了10秒左右。
在PostgreSQL 14上
[pg14@pg01 ~]$ psqlpsql (14beta1)Type "help" for help.postgres=# create table foo5 (a text, b text, c text, d text, e text);CREATE TABLEpostgres=# insert into foo5 select repeat('a', (random()*100)::int), 'bbb', 'cc',postgres-# 'd', 'eee' from generate_series(1, 10000000);INSERT 0 10000000postgres=# \timingTiming is on.postgres=# copy foo5 to '/tmp/foo5.bin' binary;COPY 10000000Time: 5179.257 ms (00:05.179)postgres=# truncate foo5;TRUNCATE TABLETime: 110.360 mspostgres=# copy foo5 from '/tmp/foo5.bin' binary;COPY 10000000Time: 8160.788 ms (00:08.161)
可以看出导入的时间是8秒,节省了2秒,时间基本缩短了20%左右。
3. COPY命令使用binary的好处
主要是导出binary格式的数据,速度比较快:
[pg13@pg01 ~]$ psqlpsql (13.3)Type "help" for help.postgres=# create table foo1(id int, t text);CREATE TABLEpostgres=# \timingTiming is on.postgres=# insert into foo1 select seq, repeat('a', 5000) from generate_series(1, 2000000) as seq;INSERT 0 2000000Time: 44657.110 ms (00:44.657)postgres=# copy foo1 to '/tmp/foo1.txt';COPY 2000000Time: 24079.960 ms (00:24.080)postgres=# copy foo1 to '/tmp/foo1.bin' binary;COPY 2000000Time: 8369.372 ms (00:08.369)
可以看到,导出BINARY格式的数据的速度比导出TXT类型的速度快的多,是近3倍。
上面是在PostgreSQL 13.3中测试的,在PostgreSQL 14中测试的结果也类似,如下所示:
[pg14@pg01 ~]$ psqlpsql (14beta1)Type "help" for help.postgres=# create table foo1(id int, t text);CREATE TABLEpostgres=# \timingTiming is on.postgres=# insert into foo1 select seq, repeat('a', 5000) from generate_series(1, 2000000) as seq;INSERT 0 2000000Time: 44571.181 ms (00:44.571)postgres=# \timingTiming is on.postgres=# copy foo1 to '/tmp/foo1.txt';ERROR: could not open file "/tmp/foo1.txt" for writing: Permission deniedHINT: COPY TO instructs the PostgreSQL server process to write a file. You may want a client-side facility such as psql's \copy.Time: 0.501 mspostgres=# copy foo1 to '/tmp/foo1.txt';COPY 2000000Time: 23975.182 ms (00:23.975)postgres=# copy foo1 to '/tmp/foo1.bin' binary;COPY 2000000Time: 8277.149 ms (00:08.277)
当然从BINARY格式的数据也更快:
[pg14@pg01 ~]$ psqlpsql (14beta1)Type "help" for help.postgres=# \timingpostgres=# copy foo1 from '/tmp/foo1.bin' binary;COPY 2000000Time: 46921.412 ms (00:46.921)postgres=# truncate table foo1;TRUNCATE TABLETime: 86.614 mspostgres=# copy foo1 from '/tmp/foo1.txt';COPY 2000000Time: 78118.955 ms (01:18.119)
从上面可以看到,导入TXT类型的文件,花了78秒,而导入BINARY类型的文件,花了47秒。
4. 结论
到PostgreSQL 14之后,使用COPY命令导入二进制格式COPY数据时,性能有20%以上的性能提升。
当用COPY命令中使用binary选项把数据导入和导出时,通常会快很多,因为不需要做一些转换。今后大家可以多试试带binary选项的COPY命令。
PostgreSQL 14在使用BINARY选项的COPY命令导入数据时,比之前的版本快的原因,是使用了buffer,减少了文件系统的fread的调用,具体可以见: