我正在做一个查询,允许我按分数点食谱。
表结构
结构是一个传单包含一个或多个
flyer_items
,其中可以包含一个或多个
ingredients_to_flyer_item
(此表将成分链接到传单项)。另一张桌子
ingredient_to_recipe
链接相同的成分,但一个或多个食谱。最后还包括指向.sql文件的链接。
示例查询
我想得到配方ID和配方中每个成分的最大价格权重之和(按配方链接),但如果一个配方包含属于同一个传单项目的多个成分,则应计算一次。
SELECT itr.recipe_id,
SUM(itr.weight),
SUM(max_price_weight),
SUM(itr.weight + max_price_weight) AS score
FROM
( SELECT MAX(itf.max_price_weight) AS max_price_weight,
itf.flyer_item_id,
itf.ingredient_id
FROM
(SELECT ifi.ingredient_id,
MAX(i.price_weight) AS max_price_weight,
ifi.flyer_item_id
FROM flyer_items i
JOIN ingredient_to_flyer_item ifi ON i.id = ifi.flyer_item_id
WHERE i.flyer_id IN (1,
2)
GROUP BY ifi.ingredient_id ) itf
GROUP BY itf.flyer_item_id) itf2
JOIN `ingredient_to_recipe` AS itr ON itf2.`ingredient_id` = itr.`ingredient_id`
WHERE recipe_id = 5730
GROUP BY itr.`recipe_id`
ORDER BY score DESC
LIMIT 0,10
查询几乎可以正常工作,因为大多数结果都是好的,但是对于某些行,一些成分被忽略,并且没有按应有的方式从分数中计算出来。
测试用例
| recipe_id | 'score' with current query | what 'score' should be | explanation |
|-----------|----------------------------|------------------------|-----------------------------------------------------------------------------|
| 8376 | 51 | 51 | Good result |
| 3152 | 1 | 18 | Only 1 ingredient having a score of one is counted, should be 4 ingredients |
| 4771 | 41 | 45 | One ingredient worth score 4 is ignored |
| 10230 | 40 | 40 | Good result |
| 8958 | 39 | 39 | Good result |
| 4656 | 28 | 34 | One ingredient worth 6 is ignored |
| 11338 | 1 | 10 | 2 ingredients, worth 4 and 5 are ignored |
我很难找到一个简单的方法来解释它。如果还有什么可以帮忙的,请告诉我。
下面是一个到演示数据库的链接,用于运行查询、测试示例和测试用例:
https://nofile.io/f/F4YSEu8DWmT/meta.zip
非常感谢你。
更新(按rick james的要求):
这是我能做的最远的。结果总是很好的,在子查询中也是,但是,我已经完全去掉了“flyer_item_id”这个组。所以通过这个查询,我得到了好的分数,但是如果配方中的许多成分是同一个传单项目,它们将被多次计算(比如配方ID=10557的分数是59,而不是好的56,因为2个成分值3是同一个传单项目)。我唯一需要更多的是计算每个配方中每个传单商品的最大重量(价格/重量),(我最初尝试按“传单商品ID”分组,而不是按配料ID分组。
SELECT itr.recipe_id,
SUM(itr.weight) as total_ingredient_weight,
SUM(itf.price_weight) as total_price_weight,
SUM(itr.weight+itf.price_weight) as score
FROM
(SELECT fi1.id, MAX(fi1.price_weight) as price_weight, ingredient_to_flyer_item.ingredient_id as ingredient_id, recipe_id
FROM flyer_items fi1
INNER JOIN (
SELECT flyer_items.id as id, MAX(price_weight) as price_weight, ingredient_to_flyer_item.ingredient_id as ingredient_id
FROM flyer_items
JOIN ingredient_to_flyer_item ON flyer_items.id = ingredient_to_flyer_item.flyer_item_id
GROUP BY id
) fi2 ON fi1.id = fi2.id AND fi1.price_weight = fi2.price_weight
JOIN ingredient_to_flyer_item ON fi1.id = ingredient_to_flyer_item.flyer_item_id
JOIN ingredient_to_recipe ON ingredient_to_flyer_item.ingredient_id = ingredient_to_recipe.ingredient_id
GROUP BY ingredient_to_flyer_item.ingredient_id) AS itf
INNER JOIN `ingredient_to_recipe` AS `itr` ON `itf`.`ingredient_id` = `itr`.`ingredient_id`
GROUP BY `itr`.`recipe_id`
ORDER BY `score` DESC
LIMIT 10
这是解释,但我不确定它是否有用,因为最后一个工作部分仍然缺失:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | |
|----|-------------|--------------------------|------------|--------|-------------------------------|---------------|---------|-------------------------------------------------------|--------|----------|---------------------------------|---|
| 1 | PRIMARY | itr | NULL | ALL | recipe_id,ingredient_id | NULL | NULL | NULL | 151800 | 100.00 | Using temporary; Using filesort | |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 4 | metadata3.itr.ingredient_id | 10 | 100.00 | NULL | |
| 2 | DERIVED | ingredient_to_flyer_item | NULL | ALL | NULL | NULL | NULL | NULL | 249 | 100.00 | Using temporary; Using filesort | |
| 2 | DERIVED | fi1 | NULL | eq_ref | id_2,id,price_weight | id_2 | 4 | metadata3.ingredient_to_flyer_item.flyer_item_id | 1 | 100.00 | NULL | |
| 2 | DERIVED | <derived3> | NULL | ref | <auto_key0> | <auto_key0> | 9 | metadata3.ingredient_to_flyer_item.flyer_item_id,m... | 10 | 100.00 | NULL | |
| 2 | DERIVED | ingredient_to_recipe | NULL | ref | ingredient_id | ingredient_id | 4 | metadata3.ingredient_to_flyer_item.ingredient_id | 40 | 100.00 | NULL | |
| 3 | DERIVED | ingredient_to_flyer_item | NULL | ALL | NULL | NULL | NULL | NULL | 249 | 100.00 | Using temporary; Using filesort | |
| 3 | DERIVED | flyer_items | NULL | eq_ref | id_2,id,flyer_id,price_weight | id_2 | 4 | metadata3.ingredient_to_flyer_item.flyer_item_id | 1 | 100.00 | NULL | |
更新2
我设法找到了一个有效的查询,但现在我必须加快速度,它需要500多毫秒才能运行。
SELECT sum(ff.price_weight) as price_weight, sum(ff.weight) as weight, sum(ff.price_weight+ff.weight) as score, ff.recipe_id FROM
(
SELECT DISTINCT
itf.flyer_item_id as flyer_item_id,
itf.recipe_id,
itf.weight,
aprice_weight AS price_weight
FROM
(SELECT itfin.flyer_item_id AS flyer_item_id,
itfin.price_weight AS aprice_weight,
itfin.ingredient_id,
itr.recipe_id,
itr.weight
FROM
(SELECT ifi2.flyer_item_id, ifi2.ingredient_id as ingredient_id, MAX(ifi2.price_weight) as price_weight
FROM
ingredient_to_flyer_item ifi1
INNER JOIN (
SELECT id, MAX(price_weight) as price_weight, ingredient_to_flyer_item.ingredient_id as ingredient_id, ingredient_to_flyer_item.flyer_item_id
FROM ingredient_to_flyer_item
GROUP BY ingredient_id
) ifi2 ON ifi1.price_weight = ifi2.price_weight AND ifi1.ingredient_id = ifi2.ingredient_id
WHERE flyer_id IN (1,2)
GROUP BY ifi1.ingredient_id) AS itfin
INNER JOIN `ingredient_to_recipe` AS `itr` ON `itfin`.`ingredient_id` = `itr`.`ingredient_id`
) AS itf
) ff
GROUP BY recipe_id
ORDER BY `score` DESC
LIMIT 20
以下是解释:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | |
|----|-------------|--------------------------|------------|-------|----------------------------------------------|---------------|---------|---------------------|------|----------|---------------------------------|---|
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 1318 | 100.00 | Using temporary; Using filesort | |
| 2 | DERIVED | <derived4> | NULL | ALL | NULL | NULL | NULL | NULL | 37 | 100.00 | Using temporary | |
| 2 | DERIVED | itr | NULL | ref | ingredient_id | ingredient_id | 4 | itfin.ingredient_id | 35 | 100.00 | NULL | |
| 4 | DERIVED | <derived5> | NULL | ALL | NULL | NULL | NULL | NULL | 249 | 100.00 | Using temporary; Using filesort | |
| 4 | DERIVED | ifi1 | NULL | ref | ingredient_id,itx_full,price_weight,flyer_id | ingredient_id | 4 | ifi2.ingredient_id | 1 | 12.50 | Using where | |
| 5 | DERIVED | ingredient_to_flyer_item | NULL | index | ingredient_id,itx_full | ingredient_id | 4 | NULL | 249 | 100.00 | NULL | |