PostgreSQL黑科技BRIN索引

前言

作为本际云服务器推荐网的小编小本本，最近项目中涉及到了Oracle到PostgreSQL的迁移，其中涉及到位图索引的改造。这里我们介绍一下PostgreSQL的黑科技BRIN索引。

PostgreSQL黑科技BRIN索引

索引原理

BRIN索引是块级索引，与B-TREE不同，BRIN记录并不是以行号为单位记录索引明细，而是记录每个数据块的统计信息。因此，BRIN索引的空间占用特别小，对数据写入、更新、删除的影响也很小。BRIN索引的扫描原理很简单，扫描BRIN的元数据，根据元数据和用户输入的条件进行比较，过滤不符合条件的HEAPPAGE，只扫描需要扫描的HEAPPAGE。

如果数据排列比较随机，那么索引效果就会非常差，达不到索引快速扫描的效果。

索引测试

我们创建两张表，一张顺序插入，一张乱序插入：

--顺序插入tab_brin1：
create table tab_brin1(
	id int,
	name varchar(40),
	c_time timestamp);
insert into tab_brin1 select *,
	md5(random()::text),
	clock_timestamp()
	from generate_series(1,10000000);

--乱序插入tab_brin2：
create table tab_brin2(
	id int,
	name varchar(40),
	c_time timestamp);
insert into tab_brin2 select
	(random()*(10^6))::integer,
	md5(random()::text),
	timestamp '2019-01-10 20:00:00' + random() * (timestamp '2021-01-10 10:00:00' - timestamp '2019-01-20 20:00:00')
	from generate_series(1,10000000);

两张表都创建BRIN索引和BTREE索引，我们会发现表为700M，BTREE索引需要214M，而BRIN索引只有40K。

create index idx1_tab_brin1 on tab_brin1 using brin(c_time);
create index idx1_tab_brin2 on tab_brin2 using brin(c_time);
create index idx2_tab_brin1 on tab_brin1 using btree(c_time);
create index idx2_tab_brin2 on tab_brin1 using btree(c_time);

select pg_size_pretty(pg_relation_size(tab_brin1));                      --> 730MB
select pg_size_pretty(pg_relation_size(idx1_tab_brin1));                --> 40k
select pg_size_pretty(pg_relation_size(idx2_tab_brin1));                --> 214MB

BRIN索引的使用

我们看看两表的离散度，可以看出tab_brin1表的逻辑顺序和物理顺序一致性更好些。

select correlation from pg_stats where tablename=tab_brin1;         --> 0.00482822771
select correlation from pg_stats where tablename=tab_brin2;         --> 0.0010042704

对比下使用两表BRIN索引时的效率，这里我们需要删除前面创建的BTREE索引。

--tab_brin1的执行计划
explain(analyze,verbose,timing,costs,buffers)
select *
from tab_brin1
where c_time between '2019-01-10 20:00:00' and '2020-01-10 20:00:00';

--tab_brin2的执行计划
explain(analyze,verbose,timing,costs,buffers)
select *
from tab_brin2
where c_time between '2019-01-10 20:00:00' and '2020-01-10 20:00:00';

经过分析，物理顺序和逻辑顺序越一致，该列更适合建立BRIN索引。

Pages_per_range参数调优

BRIN索引有一个参数pages_per_range可以用来近一步提升Brin索引的性能。pages_per_range是粒度，默认为128(表示每128个数据块统计一次边界)，它影响BRIN索引的精确度和BRIN索引的大小。

精度为1时，耗时46.6ms

create index idx1_tab_brin1 on tab_brin1 using brin(c_time) with (pages_per_range=1);

explain(analyze,verbose,timing,costs,buffers)
select *
from tab_brin1
where c_time between '2019-01-10 20:00:00' and '2020-01-10 20:00:00';

精度为50时，耗时1.18ms

create index idx1_tab_brin1 on tab_brin1 using brin(c_time) with (pages_per_range=50);

explain(analyze,verbose,timing,costs,buffers)
select *
from tab_brin1
where c_time between '2019-01-10 20:00:00' and '2020-01-10 20:00:00';

由于每筛选一次字段PostgreSQL都要扫描全部的BRIN索引，所花费的时间也会变长，因此需要根据表的大小与应用场景去调整其值的大小。

BRIN索引主要适用于类似时序数据的场景，因为具有天然的顺序，而且是添加写的场景。相比于BTREE索引，它的体积小得多，非常适用于大数据量的场景。

原创文章，作者：小编小本本，如若转载，请注明出处：https://www.benjiyun.com/yunzhujiyunwei/vps-yunwei/5924.html