代码之家  ›  专栏  ›  技术社区  ›  Torge

Postgres:为什么在带有Offset/Limit的子选择上性能如此糟糕

  •  4
  • Torge  · 技术社区  · 9 年前

    你能帮我理解一下这些陈述之间性能下降的原因吗?

    对我来说,这似乎是D&他首先将该地址加入到所有订户,并在最后应用Offset&限度他到底为什么要那样做?

    我是否遗漏了Subselects和Offset如何协同工作?他不应该先找到正确的偏移量,然后开始执行子选择吗?

    用户id 地址_id 是主键

    选择A:15 ms(OK) :选择前200个订阅者

    SELECT s.user_id
    FROM subscribers s
    ORDER BY s.user_id
    OFFSET 0 LIMIT 200
    

    选择B:45 ms(OK) :选择最后200个订阅者

    SELECT s.user_id
    FROM subscribers s
    ORDER BY s.user_id
    OFFSET 100000 LIMIT 200
    

    选择C:15 ms(OK) :选择前200个订户以及第一个可用地址

    SELECT s.user_id,
    (SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id
    FROM subscribers s
    ORDER BY s.user_id
    OFFSET 0 LIMIT 200
    

    选择D:500 ms(不正常) :选择最后200个订户以及第一个可用地址

    SELECT s.user_id,
    (SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id
    FROM subscribers s
    ORDER BY s.user_id
    OFFSET 100000 LIMIT 200
    

    选择E:1000 ms(甚至更糟) :选择最后200个订户以及前2个可用地址

    SELECT s.user_id,
    (SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id_1,
    (SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 1 LIMIT 2) AS a_id_2
    FROM subscribers s
    ORDER BY s.user_id
    OFFSET 100000 LIMIT 200
    

    选择F:15 ms(良好) :选择最后200个订户以及前2个没有偏移的可用地址,但WHERE s.user_id>改为100385

    SELECT s.user_id,
    (SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id_1,
    (SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 1 LIMIT 2) AS a_id_2
    FROM subscribers s
    WHERE s.user_id > 100385 --same as OFFSET 100000 in my data
    ORDER BY s.user_id
    LIMIT 200
    

    E的执行计划:

    Visual Plan

    'Limit  (cost=1677635.30..1677635.80 rows=200 width=4) (actual time=2251.503..2251.816 rows=200 loops=1)'
    '  Output: s.user_id, ((SubPlan 1)), ((SubPlan 2))'
    '  Buffers: shared hit=607074'
    '  ->  Sort  (cost=1677385.30..1677636.08 rows=100312 width=4) (actual time=2146.867..2200.704 rows=100200 loops=1)'
    '        Output: s.user_id, ((SubPlan 1)), ((SubPlan 2))'
    '        Sort Key: s.user_id'
    '        Sort Method:  quicksort  Memory: 7775kB'
    '        Buffers: shared hit=607074'
    '        ->  Seq Scan on public.pcv_subscriber s  (cost=0.00..1669052.31 rows=100312 width=4) (actual time=0.040..2046.926 rows=100312 loops=1)'
    '              Output: s.user_id, (SubPlan 1), (SubPlan 2)'
    '              Buffers: shared hit=607074'
    '              SubPlan 1'
    '                ->  Limit  (cost=8.29..8.29 rows=1 width=4) (actual time=0.008..0.008 rows=1 loops=100312)'
    '                      Output: ua.user_address_id'
    '                      Buffers: shared hit=301458'
    '                      ->  Sort  (cost=8.29..8.29 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=100312)'
    '                            Output: ua.user_address_id'
    '                            Sort Key: ua.user_address_id'
    '                            Sort Method:  quicksort  Memory: 25kB'
    '                            Buffers: shared hit=301458'
    '                            ->  Index Scan using ix_pcv_user_address_user_id on public.pcv_user_address ua  (cost=0.00..8.28 rows=1 width=4) (actual time=0.003..0.004 rows=1 loops=100312)'
    '                                  Output: ua.user_address_id'
    '                                  Index Cond: (ua.user_id = $0)'
    '                                  Buffers: shared hit=301458'
    '              SubPlan 2'
    '                ->  Limit  (cost=8.29..8.29 rows=1 width=4) (actual time=0.009..0.009 rows=0 loops=100312)'
    '                      Output: ua.user_address_id'
    '                      Buffers: shared hit=301458'
    '                      ->  Sort  (cost=8.29..8.29 rows=1 width=4) (actual time=0.006..0.007 rows=1 loops=100312)'
    '                            Output: ua.user_address_id'
    '                            Sort Key: ua.user_address_id'
    '                            Sort Method:  quicksort  Memory: 25kB'
    '                            Buffers: shared hit=301458'
    '                            ->  Index Scan using ix_pcv_user_address_user_id on public.pcv_user_address ua  (cost=0.00..8.28 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=100312)'
    '                                  Output: ua.user_address_id'
    '                                  Index Cond: (ua.user_id = $0)'
    '                                  Buffers: shared hit=301458'
    'Total runtime: 2251.968 ms'
    

    免责声明: 这是一个更大、更复杂的语句的精简示例,该语句使GUI表能够通过多个表对具有大量额外累积数据的订阅者进行排序/分页/过滤。所以我知道这个例子可以用更好的方式来完成。因此,相反,请帮助我理解为什么这个解决方案如此缓慢,或者最好建议最小的更改。

    更新1:

    这是使用Postgres 9.0.3制作的

    更新2:

    目前,我能想出的解决问题的最佳方案似乎是这样一句愚蠢的话:

    选择G:73ms(正常)

    SELECT s.user_id,
    (SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id_1,
    (SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 1 LIMIT 2) AS a_id_2
    FROM subscribers s
    WHERE s.user_id >= (SELECT user_id from subscribers ORDER BY user_id OFFSET 100000 LIMIT 1)
    ORDER BY s.user_id
    LIMIT 200
    

    更新3:

    迄今为止,大卫的最佳选择。(性能与G相同,但更直观)

    选择H:73ms(正常)

    SELECT s2.user_id,
    (SELECT address_id FROM address a WHERE a.user_id = s2.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id
    FROM (SELECT s.user_id
          FROM  subscribers s
          ORDER BY s.user_id
          OFFSET 100000 LIMIT 200) s2
    

    H的执行计划:

    这就是我最初对E的想象。 enter image description here

    2 回复  |  直到 9 年前
        1
  •  2
  •   David Aldridge    9 年前

    我认为SELECT子句中表达的连接即使对于最终数据集中没有包含的100000行也在执行。

    这个怎么样:

    SELECT s2.user_id,
    (SELECT address_id FROM address a WHERE a.user_id = s2.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id
    FROM (select *
          from   subscribers s
          ORDER BY s.user_id
          OFFSET 100000 LIMIT 200) s2
    

    否则,请尝试使用通用表表达式:

    With s2 as (
      select *
      from   subscribers s
      ORDER BY s.user_id
      OFFSET 100000 LIMIT 200)
    SELECT s2.user_id,
    (SELECT address_id FROM address a WHERE a.user_id = s2.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id
    FROM s2
    
        2
  •  1
  •   wildplasser    9 年前

    对于秩={1,2}的情况,这似乎是合理的。(CTE很糟糕,仅供参考)

    -- EXPLAIN ANALYZE
    SELECT s.user_id
            , MAX (CASE WHEN a0.rn = 1 THEN a0.address_id ELSE NULL END) AS ad1
            , MAX (CASE WHEN a0.rn = 2 THEN a0.address_id ELSE NULL END) AS ad2
    FROM subscribers s
    JOIN (  SELECT user_id, address_id
            , row_number() OVER(PARTITION BY user_id ORDER BY address_id) AS rn
            FROM address
            )a0 ON a0.user_id = s.user_id AND a0.rn <= 2
    GROUP BY s.user_id
    ORDER BY s.user_id
    OFFSET 10000 LIMIT 200
            ;
    

    UPDATE:下面的查询似乎性能稍好:

        -- ----------------------------------
    -- EXPLAIN ANALYZE
    SELECT s.user_id
            , MAX (CASE WHEN a0.rn = 1 THEN a0.address_id ELSE NULL END) AS ad1
            , MAX (CASE WHEN a0.rn = 2 THEN a0.address_id ELSE NULL END) AS ad2
    FROM ( SELECT user_id
            FROM subscribers
            ORDER BY user_id
            OFFSET 10000
            LIMIT 200
            ) s 
    JOIN (     SELECT user_id, address_id
            , row_number() OVER(PARTITION BY user_id ORDER BY address_id) AS rn
            FROM address
            ) a0 ON a0.user_id = s.user_id AND a0.rn <= 2
    GROUP BY s.user_id
    ORDER BY s.user_id
            ;
    

    注意:在这两个JOIN中 LEFT JOIN s、 以允许第一和第二地址丢失。


    UPDATE:将子查询(如@DavidAldridfge的答案)与原始(两个标量子查询)组合

    将subscriber表与自身自连接允许将索引用于标量子查询,而无需丢弃前100K个结果行。

    -- EXPLAIN ANALYZE
    SELECT s.user_id
    , (SELECT address_id
            FROM address a
            WHERE a.user_id = s.user_id
            ORDER BY address_id OFFSET 0 LIMIT 1
            ) AS a_id1
    , (SELECT address_id
            FROM address a
            WHERE a.user_id = s.user_id
            ORDER BY address_id OFFSET 1 LIMIT 1
            ) AS a_id2
    FROM subscribers s
    JOIN (
            SELECT user_id
            FROM subscribers
            ORDER BY user_id
            OFFSET 10000 LIMIT 200
            ) x ON x.user_id = s.user_id
    ORDER BY s.user_id
            ;