代码之家  ›  专栏  ›  技术社区  ›  Jeff B.

在多个表上选择“最大值”,而不计算两次

  •  4
  • Jeff B.  · 技术社区  · 6 年前

    我正在做一个查询,允许我按分数点食谱。

    表结构

    结构是一个传单包含一个或多个 flyer_items ,其中可以包含一个或多个 ingredients_to_flyer_item (此表将成分链接到传单项)。另一张桌子 ingredient_to_recipe 链接相同的成分,但一个或多个食谱。最后还包括指向.sql文件的链接。

    示例查询

    我想得到配方ID和配方中每个成分的最大价格权重之和(按配方链接),但如果一个配方包含属于同一个传单项目的多个成分,则应计算一次。

    SELECT itr.recipe_id,
           SUM(itr.weight),
           SUM(max_price_weight),
           SUM(itr.weight + max_price_weight) AS score
    FROM
      ( SELECT MAX(itf.max_price_weight) AS max_price_weight,
               itf.flyer_item_id,
               itf.ingredient_id
       FROM
         (SELECT ifi.ingredient_id,
                 MAX(i.price_weight) AS max_price_weight,
                 ifi.flyer_item_id
          FROM flyer_items i
          JOIN ingredient_to_flyer_item ifi ON i.id = ifi.flyer_item_id
          WHERE i.flyer_id IN (1,
                               2)
          GROUP BY ifi.ingredient_id ) itf
       GROUP BY itf.flyer_item_id) itf2
    JOIN `ingredient_to_recipe` AS itr ON itf2.`ingredient_id` = itr.`ingredient_id`
    WHERE recipe_id = 5730
    GROUP BY itr.`recipe_id`
    ORDER BY score DESC
    LIMIT 0,10
    

    查询几乎可以正常工作,因为大多数结果都是好的,但是对于某些行,一些成分被忽略,并且没有按应有的方式从分数中计算出来。

    测试用例

    | recipe_id | 'score' with current query | what 'score' should be | explanation                                                                 |
    |-----------|----------------------------|------------------------|-----------------------------------------------------------------------------|
    | 8376      | 51                         | 51                     | Good result                                                                 |
    | 3152      | 1                          | 18                     | Only 1 ingredient having a score of one is counted, should be 4 ingredients |
    | 4771      | 41                         | 45                     | One ingredient worth score 4 is ignored                                     |
    | 10230     | 40                         | 40                     | Good result                                                                 |
    | 8958      | 39                         | 39                     | Good result                                                                 |
    | 4656      | 28                         | 34                     | One ingredient worth 6 is ignored                                           |
    | 11338     | 1                          | 10                     | 2 ingredients, worth 4 and 5 are ignored                                    |
    

    我很难找到一个简单的方法来解释它。如果还有什么可以帮忙的,请告诉我。

    下面是一个到演示数据库的链接,用于运行查询、测试示例和测试用例: https://nofile.io/f/F4YSEu8DWmT/meta.zip

    非常感谢你。

    更新(按rick james的要求):

    这是我能做的最远的。结果总是很好的,在子查询中也是,但是,我已经完全去掉了“flyer_item_id”这个组。所以通过这个查询,我得到了好的分数,但是如果配方中的许多成分是同一个传单项目,它们将被多次计算(比如配方ID=10557的分数是59,而不是好的56,因为2个成分值3是同一个传单项目)。我唯一需要更多的是计算每个配方中每个传单商品的最大重量(价格/重量),(我最初尝试按“传单商品ID”分组,而不是按配料ID分组。

    SELECT itr.recipe_id,
           SUM(itr.weight) as total_ingredient_weight,
           SUM(itf.price_weight) as total_price_weight,
           SUM(itr.weight+itf.price_weight) as score
    FROM
      (SELECT fi1.id, MAX(fi1.price_weight) as price_weight, ingredient_to_flyer_item.ingredient_id as ingredient_id, recipe_id
    FROM flyer_items fi1
    INNER JOIN (
        SELECT flyer_items.id as id, MAX(price_weight) as price_weight, ingredient_to_flyer_item.ingredient_id as ingredient_id
        FROM flyer_items
        JOIN ingredient_to_flyer_item ON flyer_items.id = ingredient_to_flyer_item.flyer_item_id
        GROUP BY id
    ) fi2 ON fi1.id = fi2.id AND fi1.price_weight = fi2.price_weight
    JOIN ingredient_to_flyer_item ON fi1.id = ingredient_to_flyer_item.flyer_item_id
    JOIN ingredient_to_recipe ON ingredient_to_flyer_item.ingredient_id = ingredient_to_recipe.ingredient_id
    GROUP BY ingredient_to_flyer_item.ingredient_id) AS itf
    INNER JOIN `ingredient_to_recipe` AS `itr` ON `itf`.`ingredient_id` = `itr`.`ingredient_id`
    GROUP BY `itr`.`recipe_id`
    ORDER BY `score` DESC
    LIMIT 10
    

    这是解释,但我不确定它是否有用,因为最后一个工作部分仍然缺失:

    | id | select_type | table                    | partitions | type   | possible_keys                 | key           | key_len | ref                                                   | rows   | filtered | Extra                           |   |
    |----|-------------|--------------------------|------------|--------|-------------------------------|---------------|---------|-------------------------------------------------------|--------|----------|---------------------------------|---|
    | 1  | PRIMARY     | itr                      | NULL       | ALL    | recipe_id,ingredient_id       | NULL          | NULL    | NULL                                                  | 151800 | 100.00   | Using temporary; Using filesort |   |
    | 1  | PRIMARY     | <derived2>               | NULL       | ref    | <auto_key0>                   | <auto_key0>   | 4       | metadata3.itr.ingredient_id                           | 10     | 100.00   | NULL                            |   |
    | 2  | DERIVED     | ingredient_to_flyer_item | NULL       | ALL    | NULL                          | NULL          | NULL    | NULL                                                  | 249    | 100.00   | Using temporary; Using filesort |   |
    | 2  | DERIVED     | fi1                      | NULL       | eq_ref | id_2,id,price_weight          | id_2          | 4       | metadata3.ingredient_to_flyer_item.flyer_item_id      | 1      | 100.00   | NULL                            |   |
    | 2  | DERIVED     | <derived3>               | NULL       | ref    | <auto_key0>                   | <auto_key0>   | 9       | metadata3.ingredient_to_flyer_item.flyer_item_id,m... | 10     | 100.00   | NULL                            |   |
    | 2  | DERIVED     | ingredient_to_recipe     | NULL       | ref    | ingredient_id                 | ingredient_id | 4       | metadata3.ingredient_to_flyer_item.ingredient_id      | 40     | 100.00   | NULL                            |   |
    | 3  | DERIVED     | ingredient_to_flyer_item | NULL       | ALL    | NULL                          | NULL          | NULL    | NULL                                                  | 249    | 100.00   | Using temporary; Using filesort |   |
    | 3  | DERIVED     | flyer_items              | NULL       | eq_ref | id_2,id,flyer_id,price_weight | id_2          | 4       | metadata3.ingredient_to_flyer_item.flyer_item_id      | 1      | 100.00   | NULL                            |   |
    

    更新2

    我设法找到了一个有效的查询,但现在我必须加快速度,它需要500多毫秒才能运行。

    SELECT sum(ff.price_weight) as price_weight, sum(ff.weight) as weight, sum(ff.price_weight+ff.weight) as score, ff.recipe_id FROM
    (
    SELECT DISTINCT
           itf.flyer_item_id as flyer_item_id,
           itf.recipe_id,
           itf.weight,
           aprice_weight AS price_weight
    FROM
      (SELECT itfin.flyer_item_id AS flyer_item_id,
              itfin.price_weight AS aprice_weight,
              itfin.ingredient_id,
              itr.recipe_id,
              itr.weight
       FROM
         (SELECT ifi2.flyer_item_id, ifi2.ingredient_id as ingredient_id, MAX(ifi2.price_weight) as price_weight
            FROM
              ingredient_to_flyer_item ifi1
            INNER JOIN (
                    SELECT id, MAX(price_weight) as price_weight, ingredient_to_flyer_item.ingredient_id as ingredient_id, ingredient_to_flyer_item.flyer_item_id
                    FROM ingredient_to_flyer_item
                    GROUP BY ingredient_id
                ) ifi2 ON ifi1.price_weight = ifi2.price_weight AND ifi1.ingredient_id = ifi2.ingredient_id
            WHERE flyer_id IN (1,2)
            GROUP BY ifi1.ingredient_id) AS itfin
          INNER JOIN `ingredient_to_recipe` AS `itr` ON `itfin`.`ingredient_id` = `itr`.`ingredient_id`
    
         ) AS itf
    ) ff
    GROUP BY recipe_id
    ORDER BY `score` DESC
    LIMIT 20
    

    以下是解释:

    | id | select_type | table                    | partitions | type  | possible_keys                                | key           | key_len | ref                 | rows | filtered | Extra                           |   |
    |----|-------------|--------------------------|------------|-------|----------------------------------------------|---------------|---------|---------------------|------|----------|---------------------------------|---|
    | 1  | PRIMARY     | <derived2>               | NULL       | ALL   | NULL                                         | NULL          | NULL    | NULL                | 1318 | 100.00   | Using temporary; Using filesort |   |
    | 2  | DERIVED     | <derived4>               | NULL       | ALL   | NULL                                         | NULL          | NULL    | NULL                | 37   | 100.00   | Using temporary                 |   |
    | 2  | DERIVED     | itr                      | NULL       | ref   | ingredient_id                                | ingredient_id | 4       | itfin.ingredient_id | 35   | 100.00   | NULL                            |   |
    | 4  | DERIVED     | <derived5>               | NULL       | ALL   | NULL                                         | NULL          | NULL    | NULL                | 249  | 100.00   | Using temporary; Using filesort |   |
    | 4  | DERIVED     | ifi1                     | NULL       | ref   | ingredient_id,itx_full,price_weight,flyer_id | ingredient_id | 4       | ifi2.ingredient_id  | 1    | 12.50    | Using where                     |   |
    | 5  | DERIVED     | ingredient_to_flyer_item | NULL       | index | ingredient_id,itx_full                       | ingredient_id | 4       | NULL                | 249  | 100.00   | NULL                            |   |
    
    3 回复  |  直到 6 年前
        1
  •  1
  •   Rick James diyism    6 年前

    听起来像是“爆炸内爆”。 这是查询具有 JOIN GROUP BY .

    1. 这个 加入 从联接表中收集适当的行组合; 然后
    2. 这个 分组依据 COUNTs , SUMs ,等等,为聚合提供膨胀值。

    有两种常见的修复方法,它们都涉及执行与 加入 .

    案例1:

    SELECT  ...
            ( SELECT SUM(x) FROM t2 WHERE id = ... ) AS sum_x,
            ...
        FROM t1 ...
    

    如果您需要t2中的多个聚合,那么这种情况会变得笨拙,因为它一次只允许一个聚合。

    案例2:

    SELECT ...
        FROM ( SELECT grp,
                      SUM(x) AS sum_x,
                      COUNT(*) AS ct
               FROM t2 ) AS s
        JOIN t1 ON t1.grp = s.grp
    

    你有2个 JOINs 和3 GROUP BYs ,因此我建议您从内到外调试(并重写)查询。

            SELECT  ifi.ingredient_id,
                    MAX(price_weight) as max_price_weight,
                    flyer_item_id
                from  flyer_items i
                join  ingredient_to_flyer_item ifi  ON i.id = ifi.flyer_item_id
                where  flyer_id in (1, 2)
                group by  ifi.ingredient_id 
    

    但我帮不了你,因为你没有资格 price_weight 根据它所在的表(或别名)。(其他一些列也是如此。)

    (实际上, MAX MIN 不会得到膨胀值; AVG 会得到稍微错误的值; COUNT SUM 获取“错误”值。)

    因此,我将剩下的作为“练习”留给读者。

    索引

    itr:  (ingredient_id, recipe_id)  -- for the JOIN and WHERE and GROUP BY
    itr:  (recipe_id, ingredient_id, weight)  -- for 1st Update
    (There is no optimization available for the ORDER BY and LIMIT)
    flyer_items:  (flyer_id, price_weight) -- unless flyer_id is the PRIMARY KEY
    ifi:  (flyer_item_id, ingredient_id)
    ifi:  (ingredient_id, flyer_item_id)  -- for 1st Update
    

    请为相关表提供“show create table”。

    请提供 EXPLAIN SELECT ... .

    如果 ingredient_to_flyer_item 有很多:很多映射表,请按照提示操作 here . 同上 ingredient_to_recipe ?

    GROUP BY itf.flyer_item_id 可能无效,因为它不包括非聚合 ifi.ingredient_id . 请参阅“仅限完整分组”。

    重新制定

    完成评估后 INDEXes ,请尝试以下操作。 注意:我不知道它是否能正常工作。

    JOIN  `ingredient_to_recipe` AS itr  ON itf2.`ingredient_id` = itr.`ingredient_id`
    

    JOIN ( SELECT recipe_id,
                  ingredient_id,
                  SUM(weight) AS sum_weight
               FROM ingredient_to_recipe ) AS itr
    

    并更改首字母 SELECT 替换 通过这些计算的总和。(我想我没有处理 ingredient_id 正确。)

    你在运行什么版本的mysql/mariadb?

        2
  •  1
  •   Nick SamSmith1986    6 年前

    我一直想看看这个,但不幸的是直到现在还没有时间。我想这个查询会给出你想要的结果。

    SELECT recipe_id, SUM(weight) AS weight, SUM(max_price_weight) AS price_weight, SUM(weight + max_price_weight) AS score 
    FROM (SELECT recipe_id, ingredient_id, MAX(weight) AS weight, MAX(price_weight) AS max_price_weight
          FROM (SELECT itr.recipe_id, MIN(itr.ingredient_id) AS ingredient_id, MAX(itr.weight) AS weight, fi.id, MAX(fi.price_weight) AS price_weight
                FROM ingredient_to_recipe itr 
                JOIN ingredient_to_flyer_item itfi ON itfi.ingredient_id = itr.ingredient_id 
                JOIN flyer_items fi ON fi.id = itfi.flyer_item_id 
                GROUP BY itr.recipe_id, fi.id) ri
          GROUP BY recipe_id, ingredient_id) r
    GROUP BY recipe_id
    ORDER BY score DESC
    LIMIT 10
    

    它首先按 flyer_item_id 然后 MIN(ingredient_id) 在配方中考虑相同的成分 传单物品编号 . 然后将结果加起来得到你想要的分数。如果我将查询与

    HAVING recipe_id IN (8376, 3152, 4771, 10230, 8958, 4656, 11338)
    

    子句它给出了以下结果,这些结果与上面的“应该得到什么分数”一栏相匹配:

    recipe_id   weight  price_weight    score   
    8376        10      41              51
    4771        5       40              45
    10230       10      30              40
    8958        15      24              39
    4656        15      19              34
    3152        0       18              18
    11338       0       10              10
    

    我不确定这个查询在您的系统上执行的速度有多快,它可以与您在我的笔记本电脑上的查询相媲美(我希望它会慢一些)。我很确定有一些优化是可能的,但我再说一次,还没来得及深入研究。

    我希望这能为你找到一个可行的解决方案提供更多帮助。

        3
  •  0
  •   The Impaler    6 年前

    我不确定我是否完全理解这个问题。在我看来你是按错误的列分组的 flyer_items.id . 你应该按列分组 ingredient_id 相反。如果你这样做,(对我)就更有意义了。我是这样看的:

    select
        itr.recipe_id,
        sum(itr.weight),
        sum(max_price_weight),
        sum(itr.weight + max_price_weight) as score
      from (
        select
            ifi.ingredient_id, 
            max(price_weight) as max_price_weight
          from flyer_items i
          join ingredients_to_flyer_item ifi on i.id = ifi.flyer_item_id
          where flyer_id in (1, 2)
          group by ifi.ingredient_id
        ) itf
      join `ingredient_to_recipe` as itr on itf.`ingredient_id` = itr.`ingredient_id`
      group by itr.`recipe_id`
      order by score desc
      limit 0,10;
    

    我希望有帮助。