代码之家  ›  专栏  ›  技术社区  ›  llewmills

合并数据帧,以便将一个数据帧中的值插入到另一个数据帧中匹配的行号中

r
  •  0
  • llewmills  · 技术社区  · 6 年前

    我想以某种方式更改数据集的格式。假设我有一个数据列表,显示参与者参加couselling会议的时间和次数。他们在12周内的任何时间最多可以参加3次会议。说他们的数据是这样记录的

    set.seed(01234)
    df1 <- data.frame(id = rep(LETTERS[1:4], each = 3),
                      session = rep(paste0("session", 1:3), length.out = 12),
                      week1 = c(sort(sample(1:12, 3, replace = F)), 
                               sort(sample(1:12, 3, replace = F)), 
                               sort(sample(1:12, 3, replace = F)), 
                               sort(sample(1:12, 3, replace = F)))) 
    df1$week1[c(3,8,9,12)] <- NA # insert some NAs representing sessions that weren't attended
    

    #    id  session week1
    # 1   A session1     2
    # 2   A session2     7
    # 3   A session3    NA
    # 4   B session1     7
    # 5   B session2     8
    # 6   B session3    10
    # 7   C session1     1
    # 8   C session2    NA
    # 9   C session3    NA
    # 10  D session1     6
    # 11  D session2     7
    # 12  D session3    NA
    

    但是我想要一个长的数据集,每个人在他们本可以参加的12周中的每一周都有一次争吵,就像这样

    df2 <- data.frame(id = rep(LETTERS[1:4], each = 12),
                      week2 = rep(1:12, times = 4))
    

    参与者A的数据是这样的

    df2[1:12,]
    
    #    id week2
    # 1   A     1
    # 2   A     2
    # 3   A     3
    # 4   A     4
    # 5   A     5
    # 6   A     6
    # 7   A     7
    # 8   A     8
    # 9   A     9
    # 10  A    10
    # 11  A    11
    # 12  A    12
    

    我想以某种方式将两者合并,以便 week1 df1的列与df2中相应的行相匹配,理想情况下是这样的(示例仅限于参与者A)

    data.frame(id = rep("A", 12),
               week = 1:12,
               attended = c(0,1,0,0,0,0,1,0,0,0,0,0))
    
    #    id week attended
    # 1   A    1        0
    # 2   A    2        1
    # 3   A    3        0
    # 4   A    4        0
    # 5   A    5        0
    # 6   A    6        0
    # 7   A    7        1
    # 8   A    8        0
    # 9   A    9        0
    # 10  A   10        0
    # 11  A   11        0
    # 12  A   12        0
    
    3 回复  |  直到 6 年前
        1
  •  1
  •   DanY    6 年前

    一种利用合并的方法:

    # merge the 2 dataframes
    names(df2)[2] <- "week"
    names(df1)[3] <- "week"
    df <- merge(df2, df1, by=c("id", "week"), all.x=T)
    
    # replace 'session' with 1s and 0s
    df$session <- !is.na(df$session)
    
        2
  •  1
  •   d.b    6 年前
    do.call(rbind, lapply(split(df2, df2$id), function(x){
        x$attended = as.integer(x$week2 %in% df1$week1[df1$id == x$id[1]])
        x
    }))
    
        3
  •  1
  •   lroha    6 年前

    可以使用展开原始data.frame tidyr::complete 所以你不需要合并,只要定义 week1 作为具有正确层数的因素:

    library(dplyr)
    library(tidyr)
    
    df1 %>% 
      group_by(id) %>%
      mutate(week1 = factor(week1, levels = 1:12), 
             session = !is.na(session)) %>%
      complete(week1, fill = list(session = 0)) 
    
    # A tibble: 52 x 3
    # Groups:   id [4]
       id    week1 session
       <fct> <fct>   <dbl>
     1 A     1           0
     2 A     2           1
     3 A     3           0
     4 A     4           0
     5 A     5           0
     6 A     6           0
     7 A     7           1
     8 A     8           0
     9 A     9           0
    10 A     10          0
    # ... with 42 more rows