代码之家  ›  专栏  ›  技术社区  ›  Jacco

MySQL中获取累积和的最佳查询

  •  8
  • Jacco  · 技术社区  · 14 年前

    在MySQL中获取累积和的“正确”查询是什么?

    我有一个保存文件信息的表,一列列表包含文件的大小(以字节为单位)(实际文件保存在磁盘上的某个地方)

    +------------+---------+--------+----------------+
    | fileInfoId | groupId | size   | cumulativeSize |
    +------------+---------+--------+----------------+
    |          1 |       1 | 522120 |         522120 |
    |          2 |       2 | 316042 |         316042 |
    |          4 |       2 | 711084 |        1027126 |
    |          5 |       2 | 697002 |        1724128 |
    |          6 |       2 | 663425 |        2387553 |
    |          7 |       2 | 739553 |        3127106 |
    |          8 |       2 | 700938 |        3828044 |
    |          9 |       2 | 695614 |        4523658 |
    |         10 |       2 | 744204 |        5267862 |
    |         11 |       2 | 609022 |        5876884 |
    |        ... |     ... |    ... |            ... |
    +------------+---------+--------+----------------+
    20000 rows in set (19.2161 sec.)
    

    SELECT
      a.fileInfoId
    , a.groupId
    , a.size
    , SUM(b.size) AS cumulativeSize
    FROM fileInfo AS a
    LEFT JOIN fileInfo AS b USING(groupId)
    WHERE a.fileInfoId >= b.fileInfoId
    GROUP BY a.fileInfoId
    ORDER BY a.groupId, a.fileInfoId
    

    然而,我的解决方案非常缓慢(大约19秒(无缓存)。

    Explain给出了以下执行细节

    +----+--------------+-------+-------+-------------------+-----------+---------+----------------+-------+-------------+
    | id | select_type  | table | type  | possible_keys     | key       | key_len | ref            | rows  | Extra       |
    +----+--------------+-------+-------+-------------------+-----------+---------+----------------+-------+-------------+
    |  1 | SIMPLE       |     a | index | PRIMARY,foreignId | PRIMARY   |       4 | NULL           | 14905 |             |
    |  1 | SIMPLE       |     b | ref   | PRIMARY,foreignId | foreignId |       4 | db.a.foreignId |    36 | Using where |
    +----+--------------+-------+-------+-------------------+-----------+---------+----------------+-------+-------------+
    



    我的问题是:



    更新
    我已经更新了这个问题,以提供表结构和一个用20000条记录和测试数据填充表的过程。

    CREATE TABLE `fileInfo` (
      `fileInfoId` int(10) unsigned NOT NULL AUTO_INCREMENT
    , `groupId` int(10) unsigned NOT NULL
    , `name` varchar(128) NOT NULL
    , `size` int(10) unsigned NOT NULL
    , PRIMARY KEY (`fileInfoId`)
    , KEY `groupId` (`groupId`)
    ) ENGINE=InnoDB;
    
    delimiter $$
    DROP PROCEDURE IF EXISTS autofill$$
    CREATE PROCEDURE autofill()
    BEGIN
        DECLARE i INT DEFAULT 0;
        DECLARE gid INT DEFAULT 0;
        DECLARE nam char(20);
        DECLARE siz INT DEFAULT 0;
        WHILE i < 20000 DO
            SET gid = FLOOR(RAND() * 250);
            SET nam = CONV(FLOOR(RAND() * 10000000000000), 20, 36);
            SET siz = FLOOR((RAND() * 1024 * 1024));
            INSERT INTO `fileInfo` (`groupId`, `name`, `size`) VALUES(gid, nam, siz);
            SET i = i + 1;
        END WHILE;
    END;$$
    delimiter ;
    
    CALL autofill();
    

    关于可能重复的问题
    The question 链接者 Forgotten Semicolon 不是同一个问题。我的问题多了一栏。由于这个额外的groupId列,这里接受的答案对我的问题不起作用(也许它能适应工作,但我不知道怎么做,因此我的问题)

    2 回复  |  直到 7 年前
        1
  •  20
  •   Jacco    14 年前

    你可以使用一个变量-它比任何连接都快得多:

    SELECT
        id,
        size,
        @total := @total + size AS cumulativeSize,
    FROM table, (SELECT @total:=0) AS t;
    

    以下是运行Debian 5.0的128MB RAM奔腾III的快速测试案例:

    创建表:

    DROP TABLE IF EXISTS `table1`;
    
    CREATE TABLE `table1` (
        `id` int(11) NOT NULL auto_increment,
        `size` int(11) NOT NULL,
        PRIMARY KEY  (`id`)
    ) ENGINE=InnoDB;
    

    DELIMITER //
    DROP PROCEDURE IF EXISTS autofill//
    CREATE PROCEDURE autofill()
    BEGIN
        DECLARE i INT DEFAULT 0;
        WHILE i < 20000 DO
            INSERT INTO table1 (size) VALUES (FLOOR((RAND() * 1000)));
            SET i = i + 1;
        END WHILE;
    END;
    //
    DELIMITER ;
    
    CALL autofill();
    

    检查行数:

    SELECT COUNT(*) FROM table1;
    
    +----------+
    | COUNT(*) |
    +----------+
    |    20000 |
    +----------+
    

    运行累计总计查询:

    SELECT
        id,
        size,
        @total := @total + size AS cumulativeSize
    FROM table1, (SELECT @total:=0) AS t;
    
    +-------+------+----------------+
    |    id | size | cumulativeSize |
    +-------+------+----------------+
    |     1 |  226 |            226 |
    |     2 |  869 |           1095 |
    |     3 |  668 |           1763 |
    |     4 |  733 |           2496 |
    ...
    | 19997 |  966 |       10004741 |
    | 19998 |  522 |       10005263 |
    | 19999 |  713 |       10005976 |
    | 20000 |    0 |       10005976 |
    +-------+------+----------------+
    20000 rows in set (0.07 sec)
    

    更新

    我不能要求所有的功劳,事实上,我几乎不能要求任何,因为它只是一个修改版本的 Emulate row number Common MySQL Queries .

    SELECT fileInfoId, groupId, name, size, cumulativeSize
    FROM (
        SELECT
            fileInfoId,
            groupId,
            name,
            size,
            @cs := IF(@prev_groupId = groupId, @cs+size, size) AS cumulativeSize,
            @prev_groupId := groupId AS prev_groupId
        FROM fileInfo, (SELECT @prev_groupId:=0, @cs:=0) AS vars
        ORDER BY groupId
    ) AS tmp;
    

    SELECT ... AS tmp 如果你不介意的话 prev_groupID 正在返回的列。我发现没有它它跑得稍微快一点。

    INSERT INTO `fileInfo` VALUES
    ( 1, 3, 'name0', '10'),
    ( 5, 3, 'name1', '10'),
    ( 7, 3, 'name2', '10'),
    ( 8, 1, 'name3', '10'),
    ( 9, 1, 'name4', '10'),
    (10, 2, 'name5', '10'),
    (12, 4, 'name6', '10'),
    (20, 4, 'name7', '10'),
    (21, 4, 'name8', '10'),
    (25, 5, 'name9', '10');
    
    SELECT fileInfoId, groupId, name, size, cumulativeSize
    FROM (
        SELECT
            fileInfoId,
            groupId,
            name,
            size,
            @cs := IF(@prev_groupId = groupId, @cs+size, size) AS cumulativeSize,
            @prev_groupId := groupId AS prev_groupId
        FROM fileInfo, (SELECT @prev_groupId := 0, @cs := 0) AS vars
        ORDER BY groupId
    ) AS tmp;
    
    +------------+---------+-------+------+----------------+
    | fileInfoId | groupId | name  | size | cumulativeSize |
    +------------+---------+-------+------+----------------+
    |          8 |       1 | name3 |   10 |             10 |
    |          9 |       1 | name4 |   10 |             20 |
    |         10 |       2 | name5 |   10 |             10 |
    |          1 |       3 | name0 |   10 |             10 |
    |          5 |       3 | name1 |   10 |             20 |
    |          7 |       3 | name2 |   10 |             30 |
    |         12 |       4 | name6 |   10 |             10 |
    |         20 |       4 | name7 |   10 |             20 |
    |         21 |       4 | name8 |   10 |             30 |
    |         25 |       5 | name9 |   10 |             10 |
    +------------+---------+-------+------+----------------+
    

    以下是20000行表中最后几行的示例:

    |      19481 |     248 | 8CSLJX22RCO | 1037469 |       51270389 |
    |      19486 |     248 | 1IYGJ1UVCQE |  937150 |       52207539 |
    |      19817 |     248 | 3FBU3EUSE1G |  616614 |       52824153 |
    |      19871 |     248 | 4N19QB7PYT  |  153031 |       52977184 |
    |        132 |     249 | 3NP9UGMTRTD |  828073 |         828073 |
    |        275 |     249 | 86RJM39K72K |  860323 |        1688396 |
    |        802 |     249 | 16Z9XADLBFI |  623030 |        2311426 |
    ...
    |      19661 |     249 | ADZXKQUI0O3 |  837213 |       39856277 |
    |      19870 |     249 | 9AVRTI3QK6I |  331342 |       40187619 |
    |      19972 |     249 | 1MTAEE3LLEM | 1027714 |       41215333 |
    +------------+---------+-------------+---------+----------------+
    20000 rows in set (0.31 sec)
    
        2
  •  1
  •   Marcus Adams    14 年前

    我认为MySQL只使用了表中的一个索引。在本例中,它选择了foreigned上的索引。

    添加包含primaryId和foreigned的覆盖复合索引。