代码之家 › 专栏 › 技术社区 › Johannes Gorset

使用大型in()子句或派生表上的join优化mysql查询

in-clause derived-table optimization mysql

0

Johannes Gorset · 技术社区 · 14 年前

假设我需要询问一个公司的合伙人。我有一个表“Transactions”,其中包含所做的每个事务的数据。

CREATE TABLE `transactions` (
  `transactionID` int(11) unsigned NOT NULL,
  `orderID` int(11) unsigned NOT NULL,
  `customerID` int(11) unsigned NOT NULL,
  `employeeID` int(11) unsigned NOT NULL, 
  `corporationID` int(11) unsigned NOT NULL,
  PRIMARY KEY (`transactionID`),
  KEY `orderID` (`orderID`),
  KEY `customerID` (`customerID`),
  KEY `employeeID` (`employeeID`),
  KEY `corporationID` (`corporationID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

在这个表中查询同事是相当简单的,但有一个转折点:每个员工注册一次交易记录,因此每个订单中一个公司可能有多个记录。

例如,如果公司1的员工A和B都参与向公司2销售真空吸尘器,则“交易”表中会有两个记录;每个员工一个记录,公司1的两个记录。不过,这肯定不会影响结果。公司1的交易,无论涉及多少员工,都必须被视为一个交易。

很容易,我想。我将在派生表上进行联接,如下所示:

SELECT corporationID FROM transactions JOIN (SELECT DISTINCT orderID FROM transactions WHERE corporationID = 1) AS foo USING (orderID)

查询返回与Corporation 1进行交易的公司列表。这正是我需要的,但速度非常慢,因为MySQL不能使用CorporationID索引来确定派生表。我知道这是MySQL中所有子查询/派生表的情况。

我还尝试单独查询一组orderID,并使用一个非常大的in()子句(通常为100000多个id),但事实证明,mysql在使用非常大的in()子句时也存在索引问题,因此查询时间没有得到改善。

还有其他的选择吗,或者我已经把它们都用尽了吗?

2 回复 | 直到 14 年前

1

Phil Wallach 14 年前

如果我理解你的要求,你可以试试这个。

select distinct t1.corporationID
from transactions t1
where exists (
    select 1
    from transactions t2
    where t2.corporationID =  1
    and t2.orderID = t1.orderID)
and t1.corporationID != 1;

或者:

select distinct t1.corporationID
from transactions t1
join transactions t2
on t2.orderID = t1.orderID
and t1.transactionID != t2.transactionID
where t2.corporationID = 1
and t1.corporationID != 1;

2

0

Andrew Kuklewicz 14 年前

您的数据对我来说毫无意义,我认为您使用的是corporationid,其中某个时间点的意思是客户ID,因为您的查询将事务表连接到基于orderid的corporationid=1的事务表中,以获取corporationid……然后是1,对吗?

您能详细说明customerid、employeeid和corporationid的含义吗?我如何知道员工A和B来自公司1?在这种情况下,公司1是公司ID,公司2是客户,因此存储在客户ID中?

如果是这样的话,您只需要通过以下方式进行分组:

SELECT customerID
FROM transactions
WHERE corporationID = 1
GROUP BY customerID

(或者,如果希望每个订单一行而不是每个客户一行,请选择并按医嘱ID分组。)

通过使用Group By,可以忽略以下事实:除了EmployeeID之外,存在多个重复的记录。

相反,将出售给公司2的所有公司退回。

SELECT corporationID
FROM transactions
WHERE customerID = 2
GROUP BY corporationID