代码之家 › 专栏 › 技术社区 › John Hartsock

SQL服务器使用“group by…with cube”获取非重复计数

group-by sql-server-2000 tsql sql-server sql

John Hartsock · 技术社区 · 14 年前

基本上,我想在这个立方结果中得到一个不同的计数。但不幸的是,您不能将count(distinct(field))与cube和rollup一起使用( as stated here )

以下是数据的外观。(这只是一个简单的示例,我确实希望数据中有重复项)

    Category1       Category2       ItemId
    a               b               1
    a               b               1
    a               a               1
    a               a               2
    a               c               1
    a               b               2
    a               b               3
    a               c               2
    a               a               1
    a               a               3
    a               c               4

这是我想做的,但不起作用。

SELECT
  Category1,
  Category2,
  Count(Distinct(ItemId))
FROM ItemList IL
GROUP BY
  Category1,
  Category2
WITH CUBE

我知道我可以这样做一个子选择来获得我想要的结果:

SELECT
  *,
  (SELECT
     Count(Distinct(ItemId)) 
   FROM ItemList IL2 
   WHERE 
     (Q1.Category1 IS NULL OR Q1.Category1 IS NOT NULL AND Q1.Category1 = IL2.Category1) 
     AND
     (Q1.Category2 IS NULL OR Q1.Category2 IS NOT NULL AND Q1.Category2 = IL2.Category2))
       AS DistinctCountOfItems 
FROM (SELECT
        Category1,
        Category2
      FROM ItemList IL
      GROUP BY
        Category1,
        Category2
      WITH CUBE) Q1

但是,当结果集由于子选择而变大时,速度会变慢。有没有其他方法可以从立方体结果中得到不同的计数?

这是我想看的结果

Category1     Category2    DistinctCountOfItems
a             a            3
a             b            3
a             c            3
a             NULL         4
NULL          NULL         4
NULL          a            3
NULL          b            3
NULL          c            3

5 回复 | 直到 14 年前

chezy525 14 年前

你应该能够像这样整理你的“杂乱”答案:

select Category1, Category2, count(distinct ItemId)
from ItemList
group by Category1, Category2
UNION ALL
select Category1, null, count(distinct ItemId)
from ItemList
group by Category1
UNION ALL
select null, Category2, count(distinct ItemId)
from ItemList
group by Category2
UNION ALL
select null, null, count(distinct ItemId)
from ItemList

然后我想到的另一个选择是:

select IL1.Category1, IL1.Category2, count(distinct ItemId)
from ( 
  select Category1, Category2
  from ItemList
  group by Category1, Category2
  with cube
 ) IL1
 join ItemList IL2 on (IL1.Category1=IL2.Category1 and IL1.Category2=IL2.Category2)
      or (IL1.Category1 is null and IL1.Category2=IL2.Category2)
      or (IL1.Category2 is null and IL1.Category1=IL2.Category1)
      or (IL1.Category1 is null and IL1.Category2 is null)
group by IL1.Category1, IL1.Category2

效率可能因索引、分组的列数等而异。对于我编写的测试表,子选择和联接(与联合相反)稍好一些。我现在没有访问MSSQL2000实例的权限(我在2005实例上测试过),但我认为这里的任何内容都是无效的。

更新

一个更好的选择,尤其是当您分组超过2列时(如果您分组8列,则上述代码将需要256个联接子句来捕获所有空组合!):

select IL1.Category1, IL1.Category2, count(distinct ItemId)
from ( 
  select Category1, Category2
  from ItemList
  group by Category1, Category2
  with cube
 ) IL1
 inner join ItemList IL2 on isnull(IL1.Category1,IL2.Category1)=IL2.Category1
                  and isnull(IL1.Category2,IL2.Category2)=IL2.Category2
group by IL1.Category1, IL1.Category2

John Hartsock 14 年前

这是我发现的另一种可能性,但它非常混乱。但是,它比使用嵌套select运行得更快。

SELECT 
  category1, 
  category2, 
  count(distinct itemid)
FROM (SELECT DISTINCT 
        category1, 
        category2, 
        itemid
      FROM ItemList
) x
GROUP BY category1, category2
UNION ALL
SELECT 
  category1, 
  NULL, 
  count(distinct itemid)
FROM (SELECT DISTINCT 
        category1, 
        category2, 
        itemid
      FROM ItemList
) x
GROUP BY category1
UNION ALL
SELECT 
  NULL, 
  category2, 
  count(distinct itemid)
FROM (SELECT DISTINCT 
        category1, 
        category2, 
        itemid
      FROM ItemList
) x
GROUP BY category2
UNION ALL
SELECT 
  NULL, 
  NULL, 
  count(distinct itemid)
FROM (SELECT DISTINCT 
        category1, 
        category2, 
        itemid
      FROM ItemList
) x

-1

bobs 14 年前

那很有趣。我可以在SQL Server 2008 R2中运行您的第一个查询,但文档中说它不起作用。

下面是您的第二个查询的一个变体,它可能性能更好。它在子查询中执行不同的计数,在外部查询中执行多维数据集

SELECT Category1, Category2, MAX(DistinctCount) as DistinctCount
FROM (
    SELECT Category1, Category2, COUNT(DISTINCT ItemId) as DistinctCount
    FROM ItemList
    GROUP BY Category1, Category2
    ) Q1
GROUP BY Category1, Category2
WITH CUBE

-1

exiter2000 14 年前

这个怎么样??

内部查询将返回不同的结果。

SELECT ORIGINAL_ITEM.Category1, DISTINCT_ITEM.Category2, DISTINCT_ITEM.cnt
FROM
  ( SELECT DISTINCT category2, COUNT(*) as CNT
    FROM ItemList ) DISTINCT_ITEM
JOIN ItemList ORIGINAL_ITEM on ORIGINAL_ITEM.category2 = DISTINCT_ITEM.category2 
GROUP BY ORIGINAL_ITEM.category1, DISTINCT_ITEM.category2

-1

Michael Riley - AKA Gunny 14 年前

我有以下版本:

当我执行你的查询时

SELECT 
  Category1, 
  Category2, 
  COUNT(DISTINCT(ItemId))
FROM ItemList IL 
GROUP BY 
  Category1, 
  Category2 
WITH CUBE

我得到这些结果

a       a       3
a       b       3
a       c       3
NULL    a       3
NULL    b       3
NULL    c       3
a       NULL    4
NULL    NULL    4