代码之家  ›  专栏  ›  技术社区  ›  p_mcp

JSONB索引比本机索引慢吗?

  •  1
  • p_mcp  · 技术社区  · 6 年前

    我有一张大桌子(30米排),有大约10个座位 jsonb B树索引。

    当我添加更多的条件,特别是一个稀疏 jsonb公司

    我想知道 jsonb公司 索引比本机索引慢?我是否希望通过切换到本机列而不是JSON来提高性能?

    表定义:

    id  integer 
    type    text    
    data    jsonb   
    company_index   ARRAY   
    exchange_index  ARRAY   
    eligible boolean
    

    查询示例:

    SELECT id, data, type 
    FROM collection.bundles    
    WHERE ( (ARRAY['.X'] && bundles.exchange_index)  AND   
    type IN ('discussion') AND  
    ( ((data->>'sentiment_score')::bigint > 0 AND 
    (data->'display_tweet'->'stocktwit'->'id') IS NOT NULL) )  AND  
    (  eligible = true  )  AND  
    ((data->'display_tweet'->'stocktwit')->>'id')::bigint IS NULL )  
    ORDER BY id DESC   
    LIMIT 50
    

    输出:

    Limit  (cost=0.56..16197.56 rows=50 width=212) (actual time=31900.874..31900.874 rows=0 loops=1)
      Buffers: shared hit=13713180 read=1267819 dirtied=34 written=713
      I/O Timings: read=7644.206 write=7.294
      ->  Index Scan using bundles2_id_desc_idx on bundles  (cost=0.56..2401044.17 rows=7412 width=212) (actual time=31900.871..31900.871 rows=0 loops=1)
            Filter: (eligible AND ('{.X}'::text[] && exchange_index) AND (type = 'discussion'::text) AND ((((data -> 'display_tweet'::text) -> 'stocktwit'::text) -> 'id'::text) IS NOT NULL) AND (((data ->> 'sentiment_score'::text))::bigint > 0) AND (((((data -> 'display_tweet'::text) -> 'stocktwit'::text) ->> 'id'::text))::bigint IS NULL))
            Rows Removed by Filter: 16093269
            Buffers: shared hit=13713180 read=1267819 dirtied=34 written=713
            I/O Timings: read=7644.206 write=7.294
    Planning time: 0.366 ms
    Execution time: 31900.909 ms
    

    注: jsonb公司 jsonb公司 此查询中使用的条件。 exchange_index company_index 有杜松子酒索引。

    更新 在Laurenz更改查询之后:

    Limit  (cost=150634.15..150634.27 rows=50 width=211) (actual time=15925.828..15925.828 rows=0 loops=1)
      Buffers: shared hit=1137490 read=680349 written=2
      I/O Timings: read=2896.702 write=0.038
      ->  Sort  (cost=150634.15..150652.53 rows=7352 width=211) (actual time=15925.827..15925.827 rows=0 loops=1)
            Sort Key: bundles.id DESC
            Sort Method: quicksort  Memory: 25kB
            Buffers: shared hit=1137490 read=680349 written=2
            I/O Timings: read=2896.702 write=0.038
            ->  Bitmap Heap Scan on bundles  (cost=56666.15..150316.40 rows=7352 width=211) (actual time=15925.816..15925.816 rows=0 loops=1)
                  Recheck Cond: (('{.X}'::text[] && exchange_index) AND (type = 'discussion'::text))
                  Filter: (eligible AND ((((data -> 'display_tweet'::text) -> 'stocktwit'::text) -> 'id'::text) IS NOT NULL) AND (((data ->> 'sentiment_score'::text))::bigint > 0) AND (((((data -> 'display_tweet'::text) -> 'stocktwit'::text) ->> 'id'::text))::bigint IS NULL))
                  Rows Removed by Filter: 273230
                  Heap Blocks: exact=175975
                  Buffers: shared hit=1137490 read=680349 written=2
                  I/O Timings: read=2896.702 write=0.038
                  ->  BitmapAnd  (cost=56666.15..56666.15 rows=23817 width=0) (actual time=1895.890..1895.890 rows=0 loops=1)
                        Buffers: shared hit=37488 read=85559
                        I/O Timings: read=325.535
                        ->  Bitmap Index Scan on bundles2_exchange_index_ops_idx  (cost=0.00..6515.57 rows=863703 width=0) (actual time=218.690..218.690 rows=892669 loops=1)
                              Index Cond: ('{.X}'::text[] && exchange_index)
                              Buffers: shared hit=7 read=313
                              I/O Timings: read=1.458
                        ->  Bitmap Index Scan on bundles_eligible_idx  (cost=0.00..23561.74 rows=2476877 width=0) (actual time=436.719..436.719 rows=2569331 loops=1)
                              Index Cond: (eligible = true)
                              Buffers: shared hit=37473
                        ->  Bitmap Index Scan on bundles2_type_idx  (cost=0.00..26582.83 rows=2706276 width=0) (actual time=1052.267..1052.267 rows=2794517 loops=1)
                              Index Cond: (type = 'discussion'::text)
                              Buffers: shared hit=8 read=85246
                              I/O Timings: read=324.077
    Planning time: 0.433 ms
    Execution time: 15928.959 ms
    
    1 回复  |  直到 6 年前
        1
  •  3
  •   Laurenz Albe    6 年前

    所有你喜欢的索引都没有被使用,所以问题不在于它们是否快速。

    这里有几件事在起作用:

    1. 看到 dirtied 以及 written 在索引扫描期间,我怀疑表中有很多“死元组”。当索引扫描访问它们并注意到它们已死亡时,它会“杀死”这些索引项,以便后续的索引扫描不必重复该工作。

      你可以通过跑步来减少这个问题 VACUUM 放在桌子上,或确保自动吸尘器经常处理桌子。

    2. 然而,你的主要问题是 LIMIT

      不幸的是,它必须扫描16093319行,直到找到50个点击。“高处”的一排 id “表尾不符合条件。PostgreSQL不知道这种相关性。

      解决方案是阻止PostgreSQL走这条路。最简单的方法是删除 但鉴于它的名字,这可能是不可行的。

      另一种方法是防止PostgreSQL“看到” 限制 条款计划扫描时:

      SELECT id, data, type
      FROM (SELECT id, data, type
            FROM collection.bundles
            WHERE /* all your complicated conditions */
            OFFSET 0) subquery
      ORDER BY id DESC
      LIMIT 50;
      

    备注: 你没有显示你的索引定义,但是听起来你有很多,可能太多了。索引是昂贵的,所以请确保只定义那些给您带来明显好处的索引。