我发现在使用公共子表达式假脱机的执行计划中,对于大型表,所报告的逻辑读取会非常高。
经过一些尝试和错误,我找到了一个公式,似乎适用于下面的测试脚本和执行计划。
Worktable logical reads = 1 + NumberOfRows * 2 + NumberOfGroups * 4
但我不明白这个公式成立的原因。这比我原以为有必要研究这个计划的要多。有谁能一个一个一个的解释到底发生了什么事?
或者,如果失败了,有没有办法追踪每一次逻辑读取中读取的页面,这样我就可以自己计算出来?
SET STATISTICS IO OFF; SET NOCOUNT ON;
IF Object_id('tempdb..#Orders') IS NOT NULL
DROP TABLE #Orders;
CREATE TABLE #Orders
(
OrderID INT IDENTITY(1, 1) NOT NULL PRIMARY KEY CLUSTERED,
CustomerID NCHAR(5) NULL,
Freight MONEY NULL,
);
CREATE NONCLUSTERED INDEX ix
ON #Orders (CustomerID)
INCLUDE (Freight);
INSERT INTO #Orders
VALUES (N'ALFKI', 29.46),
(N'ALFKI', 61.02),
(N'ALFKI', 23.94),
(N'ANATR', 39.92),
(N'ANTON', 22.00);
SELECT PredictedWorktableLogicalReads =
1 + 2 * Count(*) + 4 * Count(DISTINCT CustomerID)
FROM #Orders;
SET STATISTICS IO ON;
SELECT OrderID,
Freight,
Avg(Freight) OVER (PARTITION BY CustomerID) AS Avg_Freight
FROM #Orders;
输出
PredictedWorktableLogicalReads
------------------------------
23
Table 'Worktable'. Scan count 3, logical reads 23, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Orders___________000000000002'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
其他信息:
这些线轴的详细说明见
Query Tuning and Optimization
书和
this blog post by Paul White
.
总而言之,位于计划顶部的段迭代器在它发送的行中添加一个标志,指示何时开始新分区。主段spool一次从段迭代器获取一行,并将其插入tempdb中的工作表中。一旦得到一个新组已经启动的标志,它就返回一行到嵌套循环操作符的顶部输入。这将导致对工作表中的行调用流聚合,计算平均值,然后在截断工作表以准备新组之前将此值与工作表中的行重新联接。段假脱机发出一个虚拟行,以便处理最终的组。
据我所知,工作表是一个堆(或者在计划中将其表示为索引假脱机)。但是,当我尝试复制同一个进程时,它只需要11次逻辑读取。
CREATE TABLE #WorkTable
(
OrderID INT,
CustomerID NCHAR(5) NULL,
Freight MONEY NULL,
)
DECLARE @Average MONEY
PRINT 'Insert 3 Rows'
INSERT INTO #WorkTable
VALUES (1, N'ALFKI', 29.46) /*Scan count 0, logical reads 1*/
INSERT INTO #WorkTable
VALUES (2, N'ALFKI', 61.02) /*Scan count 0, logical reads 1*/
INSERT INTO #WorkTable
VALUES (3, N'ALFKI', 23.94) /*Scan count 0, logical reads 1*/
PRINT 'Calculate AVG'
SELECT @Average = Avg(Freight)
FROM #WorkTable /*Scan count 1, logical reads 1*/
PRINT 'Return Rows - With the average column included'
/*This convoluted query is just to force a nested loops plan*/
SELECT *
FROM (SELECT @Average AS Avg_Freight) T /*Scan count 1, logical reads 1*/
OUTER APPLY #WorkTable
WHERE COALESCE(Freight, OrderID) IS NOT NULL
AND @Average IS NOT NULL
PRINT 'Clear out work table'
TRUNCATE TABLE #WorkTable
PRINT 'Insert 1 Row'
INSERT INTO #WorkTable
VALUES (4, N'ANATR', 39.92) /*Scan count 0, logical reads 1*/
PRINT 'Calculate AVG'
SELECT @Average = Avg(Freight)
FROM #WorkTable /*Scan count 1, logical reads 1*/
PRINT 'Return Rows - With the average column included'
SELECT *
FROM (SELECT @Average AS Avg_Freight) T /*Scan count 1, logical reads 1*/
OUTER APPLY #WorkTable
WHERE COALESCE(Freight, OrderID) IS NOT NULL
AND @Average IS NOT NULL
PRINT 'Clear out work table'
TRUNCATE TABLE #WorkTable
PRINT 'Insert 1 Row'
INSERT INTO #WorkTable
VALUES (5, N'ANTON', 22.00) /*Scan count 0, logical reads 1*/
PRINT 'Calculate AVG'
SELECT @Average = Avg(Freight)
FROM #WorkTable /*Scan count 1, logical reads 1*/
PRINT 'Return Rows - With the average column included'
SELECT *
FROM (SELECT @Average AS Avg_Freight) T /*Scan count 1, logical reads 1*/
OUTER APPLY #WorkTable
WHERE COALESCE(Freight, OrderID) IS NOT NULL
AND @Average IS NOT NULL
PRINT 'Clear out work table'
TRUNCATE TABLE #WorkTable
PRINT 'Calculate AVG'
SELECT @Average = Avg(Freight)
FROM #WorkTable /*Scan count 1, logical reads 0*/
PRINT 'Return Rows - With the average column included'
SELECT *
FROM (SELECT @Average AS Avg_Freight) T
OUTER APPLY #WorkTable
WHERE COALESCE(Freight, OrderID) IS NOT NULL
AND @Average IS NOT NULL
DROP TABLE #WorkTable