代码之家 › 专栏 › 技术社区 › not_ur_avg_cookie

结合分区依据选择信息最多的行

partitioning row tsql sql-server sql

not_ur_avg_cookie · 技术社区 · 6 年前

我有一个重复的表,我想按ID进行分区,只选择信息最多的行(其中大多数字段包含值)。

+----+------+------+-------+---------+-------+
| ID | Name | City |  Zip  | Address | Phone |  
+----+------+------+-------+---------+-------+
|  1 | Joe  |      |       |         |       |  
|  1 | Joe  | DC   | 11111 |         |       |  
| 2  | Pete | NY   |       |         |       |  
|  2 | Pete | NY   | 10000 |         | 202-  |  
|  3 | Max  |      |       |         |       |  
| 3  | Max  |      |       |         |       |  
|  4 | Sean | MIA  |       |         |       |  
|  4 | Sean | MIA  |       | 1 blvd  |       |  
|  4 | Sean |      | 12345 |         | 305-  |  
|    |      |      |       |         |       |  
+----+------+------+-------+---------+-------+

这是我的目标:

+----+------+------+-------+---------+-------+---------+
| ID | Name | City |  Zip  | Address | Phone | Row_num |
+----+------+------+-------+---------+-------+---------+
|  1 | Joe  | DC   | 11111 |         |       |       1 |
|  2 | Pete | NY   | 10000 |         | 202-  |       1 |
| 3  | Max  |      |       |         |       |       1 |
|  4 | Sean | MIA  |       | 1 blvd  |       |       1 |
|    |      |      |       |         |       |         |
+----+------+------+-------+---------+-------+---------+

对于Joe来说,很明显,我希望第二排显示城市和邮政编码信息。

对于Pete,我还想显示第二条记录,因为它包含更多信息。

对于Max,我选择哪一行无关紧要,因为两条记录的值相同。

对于Sean,我可以选择第二行或第三行,因为第二条记录有3个包含值的字段(name、city、address),第三条记录也有三个填充字段(name、zip、phone)。所以我想从肖恩那里选哪张唱片并不重要。

如何对表进行分区并选择每个人的信息最多的行?

2 回复 | 直到 6 年前

Gordon Linoff 6 年前

如果列都是字符串,则可以使用 apply :

select t.*
from (select t.*,
             row_number() over (partition by t.id order by v.cnt desc) as seqnum
      from t cross apply
           (select count(*)
            from (values (name), (city), (zip), (address), (phone)) v(col)
            where col is not null
           ) v(cnt)
    ) t
where seqnum = 1;

如果要将其适配为空字符串,可以更改 where 到 where col is not null and col <> '' 。

paparazzo 6 年前

我想这样就行了

declare @t table (id int, name varchar(10), city varchar(10));
insert into @t values
    (1, 'Joe', null)  
  , (1, 'Joe', 'DC') 
  , (2, 'Pete', 'NY')
  , (2, null, 'NY')  
  , (3, null, 'TX') 
  , (5, 'Harry', null) 
  , (4, null, null);
select * 
from ( select * 
            , case when name is null then 0 else 1 end  
            + case when city is null then 0 else 1 end
              as rowCnt 
            , row_number() over (partition by id order by case when name is null then 0 else 1 end  
                                                        + case when city is null then 0 else 1 end desc) as rn
       from @t
     ) tt   
where tt.rn = 1 
order by tt.id

id          name       city       rowCnt      rn
----------- ---------- ---------- ----------- --------------------
1           Joe        DC         2           1
2           Pete       NY         2           1
3           NULL       TX         1           1
4           NULL       NULL       0           1
5           Harry      NULL       1           1