代码之家 › 专栏 › 技术社区 › Vega

按其他列值获取列中每个唯一值的前x%行

percentage greatest-n-per-group sqlite sql

Vega · 技术社区 · 5 年前

表“标签”:

Source  Target      Weight
#003    blitzkrank  0.83
#003    deutsch     0.7
#003    brammen     0.57
#003    butzfrauen  0.55
#003    solaaaa     0.5
#003    moments     0.3
college scandal     1.15
college prosecutors 0.82
college students    0.41
college usc         0.33
college full house  0.17
college friends     0.08
college house       0.5
college friend      0.01

该表在“Source”列中有5.600.000行和~91.000个唯一条目。

对于“Source”和“Target”中的每个唯一值,我需要按权重(表按“Source”(升序)和“Weight”(降序)排序)的前x%行(例如,前20%,前30%,需要可变)。

如果x%==0,则至少取一行。

由于会有重复项(例如,“Source=”college“将产生至少一个重复行作为”Target“=”丑闻“),如果可能,应删除重复项。否则没什么大不了的。

计算“源”:

6 rows where Source = "#003", 6 * 0.2 = 1.2 = take 1 row
8 rows where Source = "college", 8 * 0.2 = 1.6 = take 2 rows

Source  Target      Weight
#003    blitzkrank  0.83
college scandal     1.15
college prosecutors 0.82

如何在SQLite数据库中的SQL中实现这一点?

1 回复 | 直到 5 年前

Gordon Linoff 5 年前

source :

select t.*
from (select t.*,
             row_number() over (partition by source order by weight desc, target) as seqnum,
             count(*) over (partition by source) as cnt
      from t
     ) t
where seqnum = 1 or  -- always at least one row
      seqnum <= round(cnt * 0.2);

根据你的例子,我想这就是你想要的。您可以为构造一个类似的查询 target

推荐文章

Adrian · MySQL为表中的每个ID选择最新的1行

7 年前

haxx · 从sql中的表中选择最后一条记录

7 年前

Antonios Tsimourtos · 如何通过结合电子邮件和日期获得“唯一”用户

7 年前

Filipe Ferminiano · 无法在mysql索引上强制索引

7 年前

nordscan · 带连接的SQL max()

7 年前

Kirill · 如何在左连接中按最大日期分组

7 年前

Satheesh Kumar · 在postgres SQL中检索一周的记录列表

7 年前

Lennie · 选择recond by max date return latest record

7 年前

Johnny Banana · Oracle SQL-为每个用户选择最大行[重复]

7 年前

LPChip · MYSQL获取唯一ip的每个第一个id

7 年前