代码之家  ›  专栏  ›  技术社区  ›  Community wiki

优化从同一表中提取的多列的查询

  •  3
  • Community wiki  · 技术社区  · 1 年前

    这是对 another question here on SO

    我有这两个数据库表(省略了更多的表):

    acquisitions (acq)
        id {PK}
        id_cu {FK}
        datetime
        { Unique Constraint: id_cu - datetime }
    
    data
        id {PK}
        id_acq {FK acquisitions}
        id_meas
        id_elab
        value
    

    一切可能 id datetime 全部的 索引。

    当然我会的 更改数据库结构我需要以这种方式提取数据:

    • 按日期时间分组的行
    • 每列对应 data.value 对于所选 acq.id_cu - data.id_meas - data.id_elab 结合体(见帖子底部的注释)
    • 如果某列的数据丢失,但在日期时间内其他列的数据存在,则允许空单元格

    我当前的查询是这样构建的(请参阅 SO question ):

    SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (
    
    SELECT acq.datetime AS datetime, data.value AS v1, NULL AS v2, NULL AS v3 
    FROM acq INNER JOIN data ON acq.id = data.id_acq
    WHERE acq.id_cu = 3 AND data.id_meas = 2 AND data.id_elab = 1
    
    UNION
    
    SELECT acq.datetime AS datetime, NULL AS v1, data.value AS v2, NULL AS v3 
    FROM acq INNER JOIN data ON acq.id = data.id_acq
    WHERE acq.id_cu = 5 AND data.id_meas = 4 AND data.id_elab = 6
    
    UNION
    
    SELECT acq.datetime AS datetime, NULL AS v1, NULL AS v2, data.value AS v3 
    FROM acq INNER JOIN data ON acq.id = data.id_acq
    WHERE acq.id_cu = 7 AND data.id_meas = 9 AND data.id_elab = 8
    
    ) AS T
    WHERE datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
    GROUP BY datetime
    

    这里只检索3列,但正如我所说,列通常超过50列。

    它工作完美,但我想知道它是否可以在速度上进行优化。

    这是MySQL EXPLAIN EXTENDED 对于上述查询:

    +----+--------------+--------------+------+------------------------------------------------+-----------------------+---------+------------------------+-------+----------+----------------------------------------------+
    | id | select_type  | table        | type | possible_keys                                  | key                   | key_len | ref                    | rows  | filtered | Extra                                        |
    +----+--------------+--------------+------+------------------------------------------------+-----------------------+---------+------------------------+-------+----------+----------------------------------------------+
    |  1 | PRIMARY      | <derived2>   | ALL  | NULL                                           | NULL                  | NULL    | NULL                   | 82466 |   100.00 | Using where; Using temporary; Using filesort |
    |  2 | DERIVED      | acquisitions | ref  | PRIMARY,id_cu,ix_acquisitions_id_cu            | id_cu                 | 4       |                        | 18011 |   100.00 |                                              |
    |  2 | DERIVED      | data         | ref  | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq        | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                                  |
    |  3 | UNION        | acquisitions | ref  | PRIMARY,id_cu,ix_acquisitions_id_cu            | ix_acquisitions_id_cu | 4       |                        | 20864 |   100.00 |                                              |
    |  3 | UNION        | data         | ref  | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq        | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                                  |
    |  4 | UNION        | acquisitions | ref  | PRIMARY,id_cu,ix_acquisitions_id_cu            | id_cu                 | 4       |                        | 31848 |   100.00 |                                              |
    |  4 | UNION        | data         | ref  | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq        | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                                  |
    | NULL | UNION RESULT | <union2,3,4> | ALL  | NULL                                           | NULL                  | NULL    | NULL                   |  NULL |     NULL |                                              |
    +----+--------------+--------------+------+------------------------------------------------+-----------------------+---------+------------------------+-------+----------+----------------------------------------------+
    8 rows in set, 1 warning (8.24 sec)
    

    当前与( 编辑 :今天检查)39万次收购和920万个数据价值(而且还在增长) 10分钟 以提取59列的表。我知道以前的软件提取数据需要1个小时。

    感谢您耐心阅读,直到这里:)


    使现代化

    丹尼斯回答后,我尝试了他的改变。和2.,这是新查询的结果:

    SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (
    
    SELECT acq.datetime AS datetime, data.value AS v1, NULL AS v2, NULL AS v3 
    FROM acq INNER JOIN data ON acq.id = data.id_acq
    WHERE acq.id_cu = 3 AND data.id_meas = 2 AND data.id_elab = 1
    AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
    
    UNION ALL
    
    SELECT acq.datetime AS datetime, NULL AS v1, data.value AS v2, NULL AS v3 
    FROM acq INNER JOIN data ON acq.id = data.id_acq
    WHERE acq.id_cu = 5 AND data.id_meas = 4 AND data.id_elab = 6
    AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
    
    UNION ALL
    
    SELECT acq.datetime AS datetime, NULL AS v1, NULL AS v2, data.value AS v3 
    FROM acq INNER JOIN data ON acq.id = data.id_acq
    WHERE acq.id_cu = 7 AND data.id_meas = 9 AND data.id_elab = 8
    AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
    
    ) AS T GROUP BY datetime
    

    这里是新的 解释扩展 :

    +----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
    | id | select_type  | table        | type  | possible_keys                                                | key            | key_len | ref                    | rows  | filtered | Extra                           |
    +----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
    |  1 | PRIMARY      | <derived2>   | ALL   | NULL                                                         | NULL           | NULL    | NULL                   | 51997 |   100.00 | Using temporary; Using filesort |
    |  2 | DERIVED      | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu          | 12      | NULL                   | 14827 |   100.00 | Using where                     |
    |  2 | DERIVED      | data         | ref   | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab               | ix_data_id_acq | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                     |
    |  3 | UNION        | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu          | 12      | NULL                   | 18663 |   100.00 | Using where                     |
    |  3 | UNION        | data         | ref   | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab               | ix_data_id_acq | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                     |
    |  4 | UNION        | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu          | 12      | NULL                   | 13260 |   100.00 | Using where                     |
    |  4 | UNION        | data         | ref   | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab               | ix_data_id_acq | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                     |
    | NULL | UNION RESULT | <union2,3,4> | ALL   | NULL                                                         | NULL           | NULL    | NULL                   |  NULL |     NULL |                                 |
    +----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
    8 rows in set, 1 warning (3.01 sec)
    

    毫无疑问地在表演上取得了好成绩


    更新(2)

    这是一个附加点 3.

    EXPLAIN EXTENDED SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (
    
    SELECT acquisitions.datetime AS datetime, MAX(data.value) AS v1, NULL AS v2, NULL AS v3 
    FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
    WHERE acquisitions.id_cu = 1 AND data.id_meas = 1 AND data.id_elab = 2
    AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
    GROUP BY datetime
    
    UNION ALL
    
    SELECT acquisitions.datetime AS datetime, NULL AS v1, MAX(data.value) AS v2, NULL AS v3 
    FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
    WHERE acquisitions.id_cu = 4 AND data.id_meas = 1 AND data.id_elab = 2
    AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
    GROUP BY datetime
    
    UNION ALL
    
    SELECT acquisitions.datetime AS datetime, NULL AS v1, NULL AS v2, MAX(data.value) AS v3 
    FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
    WHERE acquisitions.id_cu = 8 AND data.id_meas = 1 AND data.id_elab = 2
    AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
    GROUP BY datetime
    
    ) AS T GROUP BY datetime;
    

    这是 解释扩展

    +----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
    | id | select_type  | table        | type  | possible_keys                                                | key            | key_len | ref                    | rows  | filtered | Extra                           |
    +----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
    |  1 | PRIMARY      | <derived2>   | ALL   | NULL                                                         | NULL           | NULL    | NULL                   | 51997 |   100.00 | Using temporary; Using filesort |
    |  2 | DERIVED      | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu          | 12      | NULL                   | 14827 |   100.00 | Using where                     |
    |  2 | DERIVED      | data         | ref   | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab               | ix_data_id_acq | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                     |
    |  3 | UNION        | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu          | 12      | NULL                   | 18663 |   100.00 | Using where                     |
    |  3 | UNION        | data         | ref   | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab               | ix_data_id_acq | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                     |
    |  4 | UNION        | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu          | 12      | NULL                   | 13260 |   100.00 | Using where                     |
    |  4 | UNION        | data         | ref   | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab               | ix_data_id_acq | 4       | sensor.acquisitions.id |     9 |   100.00 | Using where                     |
    | NULL | UNION RESULT | <union2,3,4> | ALL   | NULL                                                         | NULL           | NULL    | NULL                   |  NULL |     NULL |                                 |
    +----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
    8 rows in set, 1 warning (3.06 sec)
    

    只是稍微慢一点,这应该受益于大量的库仑吗?我会试试。。。


    更新(3)

    我试过有没有 MAX(data.value)... GROUP BY datetime 而且,在60列的查询中,我得到了更好的结果 具有 。每次尝试的结果各不相同,这就是其中之一。

    • 原始查询9m12.144s
    • 和Denis 1. 2. 4m6.597秒
    • 和Denis 1. , 2. 3. 4m0.210秒

    所需时间减少了约57%。


    更新(4)

    我尝试了Andary解决方案,但它比Denis优化慢得多。

    在3 组合 /列:

    • 未优化:1m3s
    • Denis优化:1.7s
    • 安德里 CASE :9.3秒

    我还在12 组合 /列:

    • 未优化:未测试
    • Denis优化:3.6s
    • 安德里 案例 :13.7秒

    此外,Andary的解决方案还引入了收购日期,其中没有任何选定组合的数据,但其他组合的数据已经存在。

    Imngine控件第一单元每30分钟在:00和:30获取一次数据,而控件第二单元在:15和:45:我将使用空的空字符将行数增加一倍。


    注:

    这一切都与传感器系统有关:有几个 控制单元 (每个一个 id_cu )有很多 传感器 每个

    单个传感器由 id_cu / id_meas 配对并发送不同 详细说明 对于每个度量,请说MIN( id_elab=1 ),最大( id_elab=2 ),平均( id_elab=3 ),即时( id_elab=... )等等,每个一个 id_elab

    用户可以自由选择他想要的任何细节,例如:

    • 结果列的控制单元#1的传感器#3的平均值(3),因此 id_cu=1 / id_meas=3 / id_elab=3
    • 结果列的控制单元#1的传感器#5的平均值(3),因此 id_cu=1 / id_meas=5 / id_elab=3
    • 另一列的控制单元#4的传感器#2的MIN值(1),因此 id_cu=4 / id_meas=2 / id_elab=1
    • (将任何有效 id_cu, id_meas, id_elab 组合)
    • 。。。

    等等,多达数十种选择。。。

    以下是部分DDL(不包括不相关的表):

    CREATE TABLE acquisitions (
        id INTEGER NOT NULL AUTO_INCREMENT, 
        id_cu INTEGER NOT NULL, 
        datetime DATETIME NOT NULL, 
        PRIMARY KEY (id), 
        UNIQUE (id_cu, datetime), 
        FOREIGN KEY(id_cu) REFERENCES ctrl_units (id) ON DELETE CASCADE
    )
    
    CREATE TABLE data (
        id INTEGER NOT NULL AUTO_INCREMENT, 
        id_acq INTEGER NOT NULL, 
        id_meas INTEGER NOT NULL, 
        id_elab INTEGER NOT NULL, 
        value FLOAT, 
        PRIMARY KEY (id), 
        FOREIGN KEY(id_acq) REFERENCES acquisitions (id) ON DELETE CASCADE
    )
    
    CREATE TABLE ctrl_units (
        id INTEGER NOT NULL, 
        name VARCHAR(40) NOT NULL, 
        PRIMARY KEY (id)
    )
    
    CREATE TABLE sensors (
        id_cu INTEGER NOT NULL, 
        id_meas INTEGER NOT NULL, 
        id_elab INTEGER NOT NULL, 
        name VARCHAR(40) NOT NULL, 
        `desc` VARCHAR(80), 
        PRIMARY KEY (id_cu, id_meas), 
        FOREIGN KEY(id_cu) REFERENCES ctrl_units (id) ON DELETE CASCADE
    )
    
    3 回复  |  直到 7 年前
        1
  •  3
  •   Denis de Bernardy    13 年前

    主要有三个问题:

    1. 使用union all,而不是union。您正在对最小/最大值进行分组和获取,因此引入排序步骤来删除重复的行是没有意义的。

    2. where条款可以放在每个工会子陈述中:

      select ...
      from (
      select ... from ...  where ...
      union all
      select ... from ...  where ...
      union all
      ...
      )
      group by ...
      

      按照您编写它的方式,它从获取所有行开始,将它们全部追加,最后过滤您需要的行。在并集子语句中注入where子句将使它只获取所需的行,最后将它们全部追加。

    3. 沿着相同的路线,预骨料:

      select ..., max(foo) as foo
      from (
      select ..., max(foo) as foo from ...  where ... group by ...
      union all
      select ..., max(foo) as foo from ...  where ... group by ...
      union all
      ...
      )
      group by ...
      

      优化器将更好地利用现有索引,最终只追加几行,而不是数百万行。

        2
  •  1
  •   Andriy M    13 年前
    SELECT
      acq.datetime,
      MAX(CASE WHEN acq.id_cu = 2 AND data.id_meas = 2 AND data.id_elab = 1 THEN data.value END) AS v1,
      MAX(CASE WHEN acq.id_cu = 5 AND data.id_meas = 4 AND data.id_elab = 6 THEN data.value END) AS v2,
      MAX(CASE WHEN acq.id_cu = 7 AND data.id_meas = 9 AND data.id_elab = 8 THEN data.value END) AS v3
    FROM acq
      INNER JOIN data acq.id = data.id_acq
    WHERE datetime >= 2011-03-01 00:00:00 AND datetime <= 2011-04-30 23:59:59
    GROUP BY acq.datetime
    

    这看起来可能与原始查询大致相同,但主要区别在于,从逻辑上讲,它只扫描表一次,而不是使用UNION扫描三次或多次。

        3
  •  0
  •   Unreason    13 年前

    基本上,我认为用SELECT和CASE处理条件会得到更好的结果。无论如何,你可能想要进行基准测试和比较。。。

    SELECT acq.datetime AS datetime, 
           MAX(
               CASE acq.id_cu
               WHEN 1 THEN data.value
               END 
           ) as v1,
           MAX(
               CASE acq.id_cu
               WHEN 4 THEN data.value
               END 
           ) as v2,
           MAX(
               CASE acq.id_cu
               WHEN 8 THEN data.value
               END 
           ) as v3
    FROM 
           acq INNER JOIN data ON acq.id = data.id_acq
    WHERE 
           data.id_meas = 1 AND data.id_elab = 2 AND
           datetime BETWEEN "2011-03-01 00:00:00" AND "2011-04-30 23:59:59"
    

    这应该是一个干净的范围扫描。 此外,使用复合索引还可以做更多的工作。

    最后,例如,使用GROUP BY有什么问题吗

    SELECT data.id_means, acq.datetime AS datetime, MAX(data.value)
    FROM 
           acq INNER JOIN data ON acq.id = data.id_acq
    WHERE 
           data.id_elab = 2 AND
           datetime BETWEEN "2011-03-01 00:00:00" AND "2011-04-30 23:59:59" AND
           data.id_means IN (1,4,8)
    GROUP BY
           data.id_means
    

    这是最简单的形式(也是最灵活的)-即使行还没有被转换为列(对于 data.id_meas )。但是,这将使您了解预期的性能以及哪些索引对查询最有用。

    编辑: 要获得*acq.id_cu-data.id_meas-data.id_elab组合*的最大数据值,您应该能够简单地使用

    SELECT 
           acq.id_cu, data.id_meas, data.id_elab, acq.datetime AS datetime, MAX(data.value)
    FROM 
           acq INNER JOIN data ON acq.id = data.id_acq
    WHERE 
           data.id_elab = 2 AND
           datetime BETWEEN "2011-03-01 00:00:00" AND "2011-04-30 23:59:59" AND
           data.id_means IN (1,4,8)
    GROUP BY
           acq.id_cu, data.id_meas, data.id_elab, acq.datetime
    

    将为以下各项的所有组合提供max(data.value) acq.id_cu, data.id_meas, data.id_elab, acq.datetime ( 之后 使用来自何处的值对其进行过滤-调整影响结果的位置)。 这不会显示没有行的组合的NULL,但如果这是正确的方向,则有一个解决方法。 GROUP BY也决定排序,因此更改GROUP BY中列的顺序。

    如果我的答案仍然没有抓住要点,那么一些示例数据/测试用例将非常有用。

    你的例子中令人困惑的部分是当你说

    每列对应的数据.value 对于选定的acq.id_cu-data.id_meas -data.id_elab组合。

    但是,当您在示例查询中选择数据时,您可以直接将它们选择为只有日期时间分组的列,因此,如果它真的是值的组合,则不可能确定哪一行对应于哪一组合(某个日期可能有多行)。如果它不是您想要筛选/分组的所有值的组合,而是确定max值的分组条件直接取决于日期时间。