你能帮我理解一下这些陈述之间性能下降的原因吗?
对我来说,这似乎是D&他首先将该地址加入到所有订户,并在最后应用Offset&限度他到底为什么要那样做?
我是否遗漏了Subselects和Offset如何协同工作?他不应该先找到正确的偏移量,然后开始执行子选择吗?
用户id
和
地址_id
是主键
选择A:15 ms(OK)
:选择前200个订阅者
SELECT s.user_id
FROM subscribers s
ORDER BY s.user_id
OFFSET 0 LIMIT 200
选择B:45 ms(OK)
:选择最后200个订阅者
SELECT s.user_id
FROM subscribers s
ORDER BY s.user_id
OFFSET 100000 LIMIT 200
选择C:15 ms(OK)
:选择前200个订户以及第一个可用地址
SELECT s.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id
FROM subscribers s
ORDER BY s.user_id
OFFSET 0 LIMIT 200
选择D:500 ms(不正常)
:选择最后200个订户以及第一个可用地址
SELECT s.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id
FROM subscribers s
ORDER BY s.user_id
OFFSET 100000 LIMIT 200
选择E:1000 ms(甚至更糟)
:选择最后200个订户以及前2个可用地址
SELECT s.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id_1,
(SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 1 LIMIT 2) AS a_id_2
FROM subscribers s
ORDER BY s.user_id
OFFSET 100000 LIMIT 200
选择F:15 ms(良好)
:选择最后200个订户以及前2个没有偏移的可用地址,但WHERE s.user_id>改为100385
SELECT s.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id_1,
(SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 1 LIMIT 2) AS a_id_2
FROM subscribers s
WHERE s.user_id > 100385 --same as OFFSET 100000 in my data
ORDER BY s.user_id
LIMIT 200
E的执行计划:
'Limit (cost=1677635.30..1677635.80 rows=200 width=4) (actual time=2251.503..2251.816 rows=200 loops=1)'
' Output: s.user_id, ((SubPlan 1)), ((SubPlan 2))'
' Buffers: shared hit=607074'
' -> Sort (cost=1677385.30..1677636.08 rows=100312 width=4) (actual time=2146.867..2200.704 rows=100200 loops=1)'
' Output: s.user_id, ((SubPlan 1)), ((SubPlan 2))'
' Sort Key: s.user_id'
' Sort Method: quicksort Memory: 7775kB'
' Buffers: shared hit=607074'
' -> Seq Scan on public.pcv_subscriber s (cost=0.00..1669052.31 rows=100312 width=4) (actual time=0.040..2046.926 rows=100312 loops=1)'
' Output: s.user_id, (SubPlan 1), (SubPlan 2)'
' Buffers: shared hit=607074'
' SubPlan 1'
' -> Limit (cost=8.29..8.29 rows=1 width=4) (actual time=0.008..0.008 rows=1 loops=100312)'
' Output: ua.user_address_id'
' Buffers: shared hit=301458'
' -> Sort (cost=8.29..8.29 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=100312)'
' Output: ua.user_address_id'
' Sort Key: ua.user_address_id'
' Sort Method: quicksort Memory: 25kB'
' Buffers: shared hit=301458'
' -> Index Scan using ix_pcv_user_address_user_id on public.pcv_user_address ua (cost=0.00..8.28 rows=1 width=4) (actual time=0.003..0.004 rows=1 loops=100312)'
' Output: ua.user_address_id'
' Index Cond: (ua.user_id = $0)'
' Buffers: shared hit=301458'
' SubPlan 2'
' -> Limit (cost=8.29..8.29 rows=1 width=4) (actual time=0.009..0.009 rows=0 loops=100312)'
' Output: ua.user_address_id'
' Buffers: shared hit=301458'
' -> Sort (cost=8.29..8.29 rows=1 width=4) (actual time=0.006..0.007 rows=1 loops=100312)'
' Output: ua.user_address_id'
' Sort Key: ua.user_address_id'
' Sort Method: quicksort Memory: 25kB'
' Buffers: shared hit=301458'
' -> Index Scan using ix_pcv_user_address_user_id on public.pcv_user_address ua (cost=0.00..8.28 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=100312)'
' Output: ua.user_address_id'
' Index Cond: (ua.user_id = $0)'
' Buffers: shared hit=301458'
'Total runtime: 2251.968 ms'
免责声明:
这是一个更大、更复杂的语句的精简示例,该语句使GUI表能够通过多个表对具有大量额外累积数据的订阅者进行排序/分页/过滤。所以我知道这个例子可以用更好的方式来完成。因此,相反,请帮助我理解为什么这个解决方案如此缓慢,或者最好建议最小的更改。
更新1:
这是使用Postgres 9.0.3制作的
更新2:
目前,我能想出的解决问题的最佳方案似乎是这样一句愚蠢的话:
选择G:73ms(正常)
SELECT s.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id_1,
(SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 1 LIMIT 2) AS a_id_2
FROM subscribers s
WHERE s.user_id >= (SELECT user_id from subscribers ORDER BY user_id OFFSET 100000 LIMIT 1)
ORDER BY s.user_id
LIMIT 200
更新3:
迄今为止,大卫的最佳选择。(性能与G相同,但更直观)
选择H:73ms(正常)
SELECT s2.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s2.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id
FROM (SELECT s.user_id
FROM subscribers s
ORDER BY s.user_id
OFFSET 100000 LIMIT 200) s2
H的执行计划:
这就是我最初对E的想象。