代码之家 › 专栏 › 技术社区 › Philipp Chapkovski

如何在dpylr中进行交叉行计算?

tidyr dplyr r

Philipp Chapkovski · 技术社区 · 6 年前

存在包含嵌套信息的数据帧。假设每个学校的学生人数、A班学生人数和B班学生人数。因此,学生=n.pupilsa+n.pupilsb+其他学生

a <- data.frame(
  city = c(rep('New York',3), rep('Washington',3)),
  n = c(5, 2, 1, 5, 2, 1),
  name = c(
    'pupils',
    'classA',
    'classB',
    'pupils',
    'classA',
    'classB'
  )
)

输出:

        city n   name
1   New York 5 pupils
2   New York 2 classA
3   New York 1 classB
4 Washington 5 pupils
5 Washington 2 classA
6 Washington 1 classB

有没有一种聪明的方法(大概是使用dplyr)来进行一个组操作,将“其他”添加到每个组中,这将是“学生”和“学生-A级”+“学生-B级”之间的区别?结果是:

        city   type npupils
1   New York classA       2
2   New York classB       1
3   New York pupils       5
4   New York  other       2
5 Washington classA       2
6 Washington classB       1
7 Washington pupils       5
8 Washington  other       2

我认为唯一可行的方法是展开它,计算列之间的差异,然后使用 tidyr :

a %>%
  spread(name, n) %>%
  mutate(other = pupils - classA - classB) %>%
  gather(type, npupils, c('classA', 'classB', 'pupils', 'other')) %>%
  arrange(city)

哪个有效,但我想知道是否有更好的方法?

1 回复 | 直到 6 年前

Ronak Shah 6 年前

我们可以创建一个汇总的数据框架并将其绑定到原始数据框架。对于每一个 city 我们计算 n 减去 n 哪里 name == 'pupils' 按组中的剩余值并创建 name 列为“其他”,并使用将这些行添加到原始数据帧 bind_rows .

library(dplyr)

bind_rows(a, a %>%
              group_by(city)%>%
              summarise(n = n[name == 'pupils'] - sum(n[name != 'pupils']), 
                       name = "Other")) %>%
arrange(city)


#        city n   name
#1   New York 5 pupils
#2   New York 2 classA
#3   New York 1 classB
#4   New York 2  Other
#5 Washington 5 pupils
#6 Washington 2 classA
#7 Washington 1 classB
#8 Washington 2  Other

注意-这里我假设每个学生只有一个“学生”条目。 城市 否则我们可以用 which.max 获取第一个条目。

推荐文章

Marc B. · 使用ggplot2创建条形图时“缺少值”

1 年前

deschen · tidyverse与外部向量发生突变,该外部向量的元素是数据帧中的列值

1 年前

Laura · 在Shiny中使用可排序的包拖放名称,这些名称将成为图表

1 年前

Mallikarjun M · 如何使用随机森林进行时间序列预测?

1 年前

ly li · 模型摘要:当表格形状改变时,拟合优度消失

1 年前

C.Robin · 将marginaffects::predictions()的结果连接回main df?

1 年前

monotonic · 如何将格式为“col1+col3+col4”的数据帧的行名转换为一列数字向量“c(1,3,4)”?

2 年前

Shawn Hemelstrand · 为什么我的自定义errorbar函数不能在R中工作?

2 年前

RoyBatty · 统计每个字符在整个数据集中出现的次数

2 年前

stats_noob · R: 记录某个“行为”发生的循环的索引?

2 年前