根据OP在评论中写的内容,查询必须确实如下:
SELECT MAX(t.head_id) AS id,
t.head_name AS name,
SUM(t.metric) AS metric
FROM (
SELECT SUM(CASE WHEN key = 'a' THEN 1 END) OVER (PARTITION BY name ORDER BY id) AS group_id,
CASE WHEN key = 'a' THEN id END AS head_id,
name AS head_name,
metric
FROM dataset
) t
GROUP BY t.head_name, t.group_id
但是,如果可以按名称和id添加索引,则确实可以提高查询的性能。这是因为在聚合之前不需要排序操作。
用一个有一百万行的表进行测试,这是无索引的解释分析的输出:
HashAggregate (cost=177154.34..177158.34 rows=400 width=25) (actual time=3374.878..3489.755 rows=400000 loops=1)
Group Key: dataset.name, sum(CASE WHEN (dataset.key = 'a'::text) THEN 1 ELSE NULL::integer END) OVER (?)
-> WindowAgg (cost=132154.34..157154.34 rows=1000000 width=25) (actual time=1920.338..3000.218 rows=1000000 loops=1)
-> Sort (cost=132154.34..134654.34 rows=1000000 width=15) (actual time=1920.323..2232.936 rows=1000000 loops=1)
Sort Key: dataset.name, dataset.id
Sort Method: external merge Disk: 28192kB
-> Seq Scan on dataset (cost=0.00..15406.00 rows=1000000 width=15) (actual time=0.020..172.746 rows=1000000 loops=1)
Planning Time: 0.870 ms
Execution Time: 3516.726 ms
通过创建索引,查询计划将更改为以下内容:
索引
:
CREATE INDEX dataset__name_id__idx ON dataset(name, id);
查询计划
:
HashAggregate (cost=90169.90..90173.90 rows=400 width=25) (actual time=1464.759..1567.778 rows=400000 loops=1)
Group Key: dataset.name, sum(CASE WHEN (dataset.key = 'a'::text) THEN 1 ELSE NULL::integer END) OVER (?)
-> WindowAgg (cost=0.42..70169.90 rows=1000000 width=25) (actual time=0.033..1077.362 rows=1000000 loops=1)
-> Index Scan using dataset__name_id__idx on dataset (cost=0.42..47669.90 rows=1000000 width=15) (actual time=0.022..225.445 rows=1000000 loops=1)
Planning Time: 0.131 ms
Execution Time: 1590.040 ms
老答案
根据您的javascript代码,您不想按以下方式对窗口进行分区
name
,也不分组
名称
在外部查询中。如果没有这个,您实际上会得到一个更好的查询,允许您只使用主索引,假设
id
列已被索引。
SELECT t.head_id AS id,
MAX(t.head_name) AS name,
SUM(t.metric) AS metric
FROM (
SELECT MAX(CASE WHEN key = 'a' THEN id END) OVER (ORDER BY id) AS head_id,
CASE WHEN key = 'a' THEN name END AS head_name,
metric
FROM dataset
) t
GROUP BY t.head_id
这是一个查询计划
dataset
有100万行:
HashAggregate (cost=68889.43..68891.43 rows=200 width=44) (actual time=1277.469..1393.709 rows=400000 loops=1)
Group Key: max(CASE WHEN (dataset.key = 'a'::text) THEN dataset.id ELSE NULL::integer END) OVER (?)
-> WindowAgg (cost=0.42..51389.43 rows=1000000 width=44) (actual time=0.025..927.595 rows=1000000 loops=1)
-> Index Scan using dataset_pkey on dataset (cost=0.42..31389.42 rows=1000000 width=15) (actual time=0.017..209.657 rows=1000000 loops=1)
Planning Time: 0.127 ms
Execution Time: 1411.975 ms