代码之家 › 专栏 › 技术社区 › Omry Atia

dplyr中类别之间的混淆

dplyr r

Omry Atia · 技术社区 · 3 年前

我有以下数据框,描述每个患者的情况(每个患者可以有1个以上):

df <- structure(list(patient = c(1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 6, 6, 
6, 7, 7, 8, 8, 9, 9, 10), condition = c("A", "A", "B", "B", "D", 
"C", "A", "C", "C", "B", "D", "B", "A", "A", "C", "B", "C", "D", 
"C", "D")), row.names = c(NA, -20L), class = c("tbl_df", "tbl", 
"data.frame"))

1 回复 | 直到 3 年前

Pete Kittinun 3 年前

library(dplyr)

df2 <- df
df2 <- inner_join(df,df, by = "patient")
table(df2$condition.x,df2$condition.y)

    A B C D
  A 5 2 2 1
  B 2 5 3 2
  C 2 3 6 2
  D 1 2 2 4

Ronak Shah 3 年前

下面是一个基本答案 outer

count_patient <- function(x, y) {
  length(intersect(df$patient[df$condition == x],
                   df$patient[df$condition == y])) 
}
vec <- sort(unique(df$condition))
res <- outer(vec, vec, Vectorize(count_patient))
dimnames(res) <- list(vec, vec)
res

#  A B C D
#A 5 2 2 1
#B 2 5 3 2
#C 2 3 6 2
#D 1 2 2 4

推荐文章

Marc B. · 使用ggplot2创建条形图时“缺少值”

1 年前

deschen · tidyverse与外部向量发生突变,该外部向量的元素是数据帧中的列值

1 年前

Laura · 在Shiny中使用可排序的包拖放名称,这些名称将成为图表

1 年前

Mallikarjun M · 如何使用随机森林进行时间序列预测?

1 年前

ly li · 模型摘要:当表格形状改变时,拟合优度消失

1 年前

C.Robin · 将marginaffects::predictions()的结果连接回main df?

1 年前

monotonic · 如何将格式为“col1+col3+col4”的数据帧的行名转换为一列数字向量“c(1,3,4)”?

2 年前

Shawn Hemelstrand · 为什么我的自定义errorbar函数不能在R中工作?

2 年前

RoyBatty · 统计每个字符在整个数据集中出现的次数

2 年前

stats_noob · R: 记录某个“行为”发生的循环的索引?

2 年前