代码之家  ›  专栏  ›  技术社区  ›  Omry Atia

dplyr中类别之间的混淆

  •  0
  • Omry Atia  · 技术社区  · 3 年前

    我有以下数据框,描述每个患者的情况(每个患者可以有1个以上):

    df <- structure(list(patient = c(1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 6, 6, 
    6, 7, 7, 8, 8, 9, 9, 10), condition = c("A", "A", "B", "B", "D", 
    "C", "A", "C", "C", "B", "D", "B", "A", "A", "C", "B", "C", "D", 
    "C", "D")), row.names = c(NA, -20L), class = c("tbl_df", "tbl", 
    "data.frame"))
    

    1 回复  |  直到 3 年前
        1
  •  2
  •   Pete Kittinun    3 年前

    library(dplyr)
    
    df2 <- df
    df2 <- inner_join(df,df, by = "patient")
    table(df2$condition.x,df2$condition.y)
    
        A B C D
      A 5 2 2 1
      B 2 5 3 2
      C 2 3 6 2
      D 1 2 2 4
    
        2
  •  1
  •   Ronak Shah    3 年前

    下面是一个基本答案 outer

    count_patient <- function(x, y) {
      length(intersect(df$patient[df$condition == x],
                       df$patient[df$condition == y])) 
    }
    vec <- sort(unique(df$condition))
    res <- outer(vec, vec, Vectorize(count_patient))
    dimnames(res) <- list(vec, vec)
    res
    
    #  A B C D
    #A 5 2 2 1
    #B 2 5 3 2
    #C 2 3 6 2
    #D 1 2 2 4