代码之家  ›  专栏  ›  技术社区  ›  deceze

优化join查询从A中获取数据,条件为B,按B排序

  •  3
  • deceze  · 技术社区  · 14 年前

    我有一个项目之间的相似性矩阵与这些表格一起设置:

    items (id, ...) (Primary key `id`)
    similarities (item1_id, item2_id, similarity) (Index on `item1_id` and `item2_id`)
    

    这个 similarities 表包含具有相似性索引的ID对,即:

    item1_id  item2_id  similarity
    1         2         0.3143
    2         3         0.734
    

    item1_id item2_id .

    现在我想找到与其他项目相似的项目,按降序相似性排序。我使用以下查询:

    SELECT    `Item`.*
    FROM      `items` AS `Item`
    LEFT JOIN `similarities` AS `Similarity`
           ON (`Item`.`id` = `Similarity`.`item1_id`
                  AND `Similarity`.`item2_id` IN (1, 2, 3, ...))
              OR (`Item`.`id` = `Similarity`.`item2_id`
                  AND `Similarity`.`item1_id` IN (1, 2, ,3, ...))
    WHERE     `Similarity`.`item1_id` IN (1, 2, 3, ...)
              OR `Similarity`.`item2_id` IN (1, 2, 3, ...)
    GROUP BY  `Item`.`id`
    ORDER BY  `Similarity`.`similarity` desc
    

    但是速度非常慢,大约100000个项目和大约30000个相似性对需要4-5秒。看来这次联姻的代价是极其高昂的。这是问题所在 EXPLAIN 预计起飞时间:

    select_type  table       type         possible_keys      key                key_len  ref   rows    Extra
    SIMPLE       Similarity  index_merge  item1_id,item2_id  item1_id,item2_id  110,110  NULL  31      Using sort_union(item1_id,...
    SIMPLE       Item        ALL          PRIMARY            NULL               NULL     NULL  136600  Using where; Using join buffer
    

    我该怎么做才能加快速度呢?最坏的情况是,我会在两个单独的查询中完成它,但如果可能的话,我更喜欢一个连接查询。

    3 回复  |  直到 14 年前
        1
  •  1
  •   gregjor    14 年前

    我其实没试过,但也许它能给你指明正确的方向。我们的想法是 UNION 的(唯一)id,相似性对来自 similarities

    SELECT Item.*, s.other_item_id, s.similarity
    FROM items AS Item
    JOIN
        (
        SELECT item1_id AS id, item2_id AS other_item_id, similarity FROM similarities
        UNION
        SELECT item2_id AS id, item1_id AS other_item_id, similarity FROM similarities
        ) AS s ON s.id = items.id
    WHERE items.id IN (1, 2, 3, ...)
    ORDER BY s.similarity DESC;
    

    在原始查询中,不需要限制 相似之处 JOIN 条件和 WHERE 条款。

        2
  •  1
  •   LesterDove    14 年前

    请原谅这个语句中的psuedo代码ish SELECT部分-我认为您实际上需要为每个字段值设置一个大小写。。。

    SELECT    
    CASE WHEN `Item2`.`id` IS NULL THEN 
      `Item1`.`id`
    ELSE `Item2`.`id`
    END,
    
    SELECT    
    CASE WHEN `Item2`.`id` IS NULL THEN 
      `Item1`.`name`
    ELSE `Item2`.`name`
    END,
    
    SELECT    
    CASE WHEN `Item2`.`id` IS NULL THEN 
      `Item1`.`description`
    ELSE `Item2`.`description`
    END,
    
    [and so on]
    
    FROM      `items` AS `Item1`
    LEFT OUTER JOIN `similarities` AS `Similarity`
           ON (`Item1`.`id` = `Similarity`.`item1_id`
    RIGHT OUTER JOIN `items` AS `Item2`
           ON (`Item2`.`id` = `Similarity`.`item2_id`       
    WHERE     `Similarity`.`item1_id` IN (1, 2, 3, ...)
              OR `Similarity`.`item2_id` IN (1, 2, 3, ...)
    ORDER BY  `Similarity`.`similarity` desc
    
        3
  •  1
  •   deceze    14 年前

    SELECT `Item`.*
    FROM `items` AS `Item`
    JOIN (
        SELECT `item1_id` AS `id`, `similarity`
        FROM   `similarities`
        WHERE  `similarities`.`item2_id` IN (1, 2, 3, ...)
        UNION
        SELECT `item2_id` AS `id`, `similarity`
        FROM   `similarities`
        WHERE  `similarities`.`item1_id` IN (1, 2, 3, ...)
    ) AS `SimilarityUnion` ON `SimilarityUnion`.`id` = `Item`.`id`
    GROUP BY `SimilarityUnion`.`id`
    ORDER BY `SimilarityUnion`.`similarity` DESC