代码之家  ›  专栏  ›  技术社区  ›  B. Davis

dplyr-ifelse语句中的嵌套条件

  •  2
  • B. Davis  · 技术社区  · 7 年前

    dplyr ifelse

    dat <- structure(list(GenIndID = c("BHS_034", "BHS_034", "BHS_068", 
    "BHS_068", "BHS_068", "BHS_068", "BHS_068", "BHS_068", "BHS_068", 
    "BHS_068", "BHS_068"), IndID = c("BHS_034_A", "BHS_034_A", "BHS_068_A", 
    "BHS_068_A", "BHS_068_A", "BHS_068_A", "BHS_068_A", "BHS_068_A", 
    "BHS_068_A", "BHS_068_A", "BHS_068_A"), Fate = c("Mort", "Mort", 
    "Alive", "Alive", "Alive", "Alive", "Alive", "Alive", "Alive", 
    "Alive", "Alive"), Status = c("Alive", "Mort", "Alive", "Alive", 
    "MIA", "Alive", "MIA", "Alive", "MIA", "Alive", "Alive"), Type = c("Linked", 
    "Linked", "SOB", "SOB", "SOB", "SOB", "SOB", "SOB", "SOB", "SOB", 
    "SOB"), SurveyID = c("GYA13-1", "GYA14-1", "GYA13-1", "GYA14-1", 
    "GYA14-2", "GYA15-1", "GYA16-1", "GYA16-2", "GYA17-1", "GYA17-3", 
    "GYA15-2"), SurveyDt = structure(c(1379570400, 1407477600, 1379570400, 
    1407477600, 1409896800, NA, 1462946400, 1474351200, 1495519200, 
    1507010400, 1441951200), tzone = "", class = c("POSIXct", "POSIXt"
    ))), row.names = c(NA, 11L), .Names = c("GenIndID", "IndID", 
    "Fate", "Status", "Type", "SurveyID", "SurveyDt"), class = "data.frame")
    
    > dat
       GenIndID     IndID  Fate Status   Type SurveyID   SurveyDt
    1   BHS_034 BHS_034_A  Mort  Alive Linked  GYA13-1 2013-09-19
    2   BHS_034 BHS_034_A  Mort   Mort Linked  GYA14-1 2014-08-08
    3   BHS_068 BHS_068_A Alive  Alive    SOB  GYA13-1 2013-09-19
    4   BHS_068 BHS_068_A Alive  Alive    SOB  GYA14-1 2014-08-08
    5   BHS_068 BHS_068_A Alive    MIA    SOB  GYA14-2 2014-09-05
    6   BHS_068 BHS_068_A Alive  Alive    SOB  GYA15-1       <NA>
    7   BHS_068 BHS_068_A Alive    MIA    SOB  GYA16-1 2016-05-11
    8   BHS_068 BHS_068_A Alive  Alive    SOB  GYA16-2 2016-09-20
    9   BHS_068 BHS_068_A Alive    MIA    SOB  GYA17-1 2017-05-23
    10  BHS_068 BHS_068_A Alive  Alive    SOB  GYA17-3 2017-10-03
    11  BHS_068 BHS_068_A Alive  Alive    SOB  GYA15-2 2015-09-11
    

    更具体地说,分组依据 GenIndID 我想创建一个最大值的新日期字段 SurveyDt 基于两个条件 Type Fate . 此外,我希望最大日期仅评估 测量数据 Status == Alive . 我下面的代码生成了所有 NA 值,而不是描述的日期字段 BHS_068

    我最近看到 case_when 这在这里可能是合适的,但我不能正确地实现它。

    dat %>% group_by(GenIndID) %>%
      mutate(NewDat = as.POSIXct(ifelse(Type == "SOB" & Fate == "Alive", max(SurveyDt[Status == "Alive"], na.rm = F), NA), 
                                 origin='1970-01-01', na.rm=T)) %>%
      as.data.frame()
    

    2 回复  |  直到 7 年前
        1
  •  2
  •   Jake Kaupp    7 年前

    如果你想坚持 dplyr 和使用 case_when 您必须确保每个case语句的值都是相同的类型。

    在这种情况下,您的真实值将是datetime,因此您必须将默认值包装为datetime as.POSIXct .

    dat %>%
      group_by(GenIndID) %>%
      mutate(NewDat = case_when(Type == "SOB" & Fate == "Alive" ~ max(SurveyDt[Status == "Alive"], na.rm = TRUE),
                                TRUE ~ as.POSIXct(NA, origin = "1970-01-01")))
    

    使用 ifelse

    dat %>%
      group_by(GenIndID) %>%
      mutate(NewDat = ifelse(Type == "SOB" & Fate == "Alive", 
                             max(SurveyDt[Status == "Alive"], na.rm = TRUE), 
                             as.POSIXct(NA, origin = "1970-01-01")))
    
        2
  •  1
  •   akrun    7 年前

    我们可以使用 data.table . 转换为数据后。表( setDT(dat) ),指定 i 作为逻辑比较,按“GenIndID”分组,我们分配( := ),则 max “SurveyDt”的“Status”为“Alive”到“NewDat”

    library(data.table)
    setDT(dat)[Type == "SOB" & Fate == "Alive",
             NewDat := max(SurveyDt[Status == "Alive"], na.rm = TRUE), GenIndID]