代码之家  ›  专栏  ›  技术社区  ›  Ian Boyd

使用RANK()计算时跳过空值

  •  3
  • Ian Boyd  · 技术社区  · 6 年前

    给定一组行,有时有一个字段 null

    SELECT 
       Date, TheThing
    FROM MyData
    ORDER BY Date
    
    
    Date                     TheThing
    -----------------------  --------
    2016-03-09 08:17:29.867  a
    2016-03-09 08:18:33.327  a
    2016-03-09 14:32:01.240  NULL
    2016-10-21 19:53:49.983  NULL
    2016-11-12 03:25:21.753  b
    2016-11-24 07:43:24.483  NULL
    2016-11-28 16:06:23.090  b
    2016-11-28 16:09:07.200  c
    2016-12-10 11:21:55.807  c
    

    我想有一个排名列来计算非空值:

    Date                     TheThing  DesiredTotal
    -----------------------  --------  ------------
    2016-03-09 08:17:29.867  a         1
    2016-03-09 08:18:33.327  a         2
    2016-03-09 14:32:01.240  NULL      2 <---notice it's still 2 (good)
    2016-10-21 19:53:49.983  NULL      2 <---notice it's still 2 (good)
    2016-11-12 03:25:21.753  b         3
    2016-11-24 07:43:24.483  NULL      3 <---notice it's still 3 (good)
    2016-11-28 16:06:23.090  b         4
    2016-11-28 16:09:07.200  c         5
    2016-12-10 11:21:55.807  c         6
    

    我试试显而易见的:

    SELECT 
       Date, TheThing, 
       RANK() OVER(ORDER BY Date) AS Total
    FROM MyData
    ORDER BY Date
    

    RANK() 计数空值:

    Date                     TheThing  Total
    -----------------------  --------  -----
    2016-03-09 08:17:29.867  a         1
    2016-03-09 08:18:33.327  a         2
    2016-03-09 14:32:01.240  NULL      3 <--- notice it is 3 (bad)
    2016-10-21 19:53:49.983  NULL      4 <--- notice it is 4 (bad)
    2016-11-12 03:25:21.753  b         5 <--- and all the rest are wrong (bad)
    2016-11-24 07:43:24.483  NULL      7
    2016-11-28 16:06:23.090  b         8
    2016-11-28 16:09:07.200  c         9
    2016-12-10 11:21:55.807  c         10
    

    我该如何指导 (或 DENSE_RANK() )不算零?

    为什么是!更糟的是:

    SELECT 
       Date, TheThing, 
       RANK() OVER(PARTITION BY(CASE WHEN TheThing IS NOT NULL THEN 1 ELSE 0 END) ORDER BY Date) AS Total
    FROM MyData
    ORDER BY Date
    

    但是 等级() 计数空值:

    Date                     TheThing  Total
    -----------------------  --------  -----
    2016-03-09 08:17:29.867  a         1
    2016-03-09 08:18:33.327  a         2
    2016-03-09 14:32:01.240  NULL      1 <--- reset to 1?
    2016-10-21 19:53:49.983  NULL      2 <--- why go up?
    2016-11-12 03:25:21.753  b         3 
    2016-11-24 07:43:24.483  NULL      3 <--- didn't reset?
    2016-11-28 16:06:23.090  b         4 
    2016-11-28 16:09:07.200  c         5
    2016-12-10 11:21:55.807  c         6
    

    现在我随机输入一些东西-疯狂的挥舞。

    SELECT 
       Date, TheThing, 
       RANK() OVER(PARTITION BY(CASE WHEN TheThing IS NOT NULL THEN 1 ELSE NULL END) ORDER BY Date) AS Total
    FROM MyData
    ORDER BY Date
    
    SELECT 
       Date, TheThing, 
       DENSE_RANK() OVER(PARTITION BY(CASE WHEN TheThing IS NOT NULL THEN 1 ELSE NULL END) ORDER BY Date) AS Total
    FROM MyData
    ORDER BY Date
    

    编辑 不要 想要。最后我概念上想要的是 OVER() 为了 计数 OVER 适用于除 RANK (和 DENSE_RANK ).

    http://sqlfiddle.com/#!18/c6d87/1

    额外阅读

    6 回复  |  直到 6 年前
        1
  •  4
  •   Gordon Linoff    6 年前

    我想你是在寻找一个累积的计数:

    SELECT Date, TheThing, 
           COUNT(theThing) OVER (ORDER BY Date) AS Total
    FROM MyData
    ORDER BY Date;
    
        2
  •  3
  •   Michał Turczyn    6 年前

    declare @tbl table (dt datetime, col int);
    insert into @tbl values
    ('2016-03-09 08:17:29.867', 1),
    ('2016-03-09 08:18:33.327', 1),
    ('2016-03-09 14:32:01.240', NULL),
    ('2016-10-21 19:53:49.983', NULL),
    ('2016-11-12 03:25:21.753', 1),
    ('2016-11-24 07:43:24.483', NULL),
    ('2016-11-28 16:06:23.090', 1),
    ('2016-11-28 16:09:07.200', 1),
    ('2016-12-10 11:21:55.807', 1);
    
    select dt,
           col,
           sum(case when col is null then 0 else 1 end) over (order by dt) rnk
    from @tbl
    

    这个想法非常简单:如果将1赋给非空值,将0赋给列为空的地方,则按日期排序的累积和与排除空值的秩完全相同。

    另一种方法是 RANK ROW_NUMBER ,这将尊重 Date 等级 尊重 NULL

    select dt,
           col,
           case when col is not null then 
               rank() over (order by dt)
           else 
               rank() over (order by dt) - row_number() over (partition by rnDiff order by dt)
           end rnk
    from (
        select dt,
               col,
               row_number() over (order by dt) -
                   row_number() over (partition by coalesce(col, 0) order by dt) rnDiff
        from @tbl
    ) a
    order by dt
    
        3
  •  1
  •   John Cappelletti    6 年前

    我的蜥蜴脑把我带到这里。。。sum()超过vs rank()

    Select *
           ,NewCol = sum(sign(TheThing)) over (Order by Date)
           ,OrEven = sum(TheThing/TheThing) over (Order by Date)  
     From  MyData
    

    enter image description here

        4
  •  1
  •   sticky bit    6 年前

    减去当前的 NULL s来自 rank()

    SELECT date,
           thething,
           rank() OVER (ORDER BY date)
           -
           sum(CASE
                 WHEN thething IS NULL THEN
                   1
                 ELSE
                   0
               END) OVER (ORDER BY date) desiredtotal
           FROM mydata;
    

    db<>fiddle

    也应该保留重复和空白 生成且不需要子查询。

        5
  •  0
  •   JMabee    6 年前

    CREATE TABLE #tmp(dt datetime, TheThing int)
    
    INSERT INTO #tmp VALUES('2016-03-09 08:17:29.867',  1)
    INSERT INTO #tmp VALUES('2016-03-09 08:18:33.327',  1)
    INSERT INTO #tmp VALUES('2016-03-09 14:32:01.240',  NULL)
    INSERT INTO #tmp VALUES('2016-10-21 19:53:49.983',  NULL)
    INSERT INTO #tmp VALUES('2016-11-12 03:25:21.753',  1)
    INSERT INTO #tmp VALUES('2016-11-24 07:43:24.483',  NULL)
    INSERT INTO #tmp VALUES('2016-11-28 16:06:23.090',  1)
    INSERT INTO #tmp VALUES('2016-11-28 16:09:07.200',  1)
    INSERT INTO #tmp VALUES('2016-12-10 11:21:55.807',  1)
    
    
    ;WITH CTE as (
    SELECT 
    CASE WHEN TheThing IS NULL THEN (SELECT MAX(dt) from #tmp OrigTbl where OrigTbl.dt <     SubTbl.dt and OrigTbl.TheThing IS NOT NULL) ELSE dt end dtMod,
    SubTbl.dt,SubTbl.TheThing
       from #tmp SubTbl)
    SELECT dt, TheThing, DENSE_RANK() over(ORDER BY dtMod) from CTE
    
        6
  •  0
  •   Yogesh Sharma    6 年前

    我会用 subquery

    SELECT [Date], TheThing,
           (SELECT COUNT(*)
            FROM MyData m
            WHERE m.[Date] <= m1.[Date] AND m.TheThing IS NOT NULL
           ) AS DesiredTotal
    FROM MyData m1;
    

    以类似的方式你也可以尝试 apply

    SELECT *
    FROM MyData m1 CROSS APPLY
        (SELECT COUNT(*) AS DesiredTotal
         FROM MyData m
         WHERE m.[Date] <= m1.[Date] AND m.TheThing IS NOT NULL
        ) m2;