代码之家  ›  专栏  ›  技术社区  ›  B. Davis

用NA值求POSIXct数据的最小或最大值

  •  2
  • B. Davis  · 技术社区  · 6 年前

    下面的数据包含单个ID的列(具有重复观测值), Date Fate

             ID       Date  Fate
    1  BHS_1149 2017-04-11   MIA
    2  BHS_1154       <NA>  <NA>
    3  BHS_1155       <NA>  <NA>
    4  BHS_1156       <NA>  <NA>
    5  BHS_1157       <NA>  Mort
    6  BHS_1159 2017-04-11 Alive
    7  BHS_1169 2017-04-11 Alive
    8  BHS_1259       <NA>  <NA>
    9  BHS_1260       <NA>  <NA>
    10 BHS_1262 2017-04-11   MIA
    11 BHS_1262 2017-07-05 Alive
    12 BHS_1262 2017-12-06 Alive
    13 BHS_1262 2017-12-06   MIA
    14 BHS_1262 2018-01-17  Mort
    

    对于每个ID,我想创建一个表示min的新列 或最大值 日期 命运 还活着。我尝试过不同的组合,如果包括和排除 na.rm = T 参数,但仍会收到以下警告。

    library(tidyverse)
    library(lubridate)
    
    dat %>% 
      group_by(ID) %>%
      mutate(
        #the first or min of Date
        FstSurvey = min(Date),
        LstAlive = max(Date[Fate == "Alive"])) %>%
      as.data.frame()
    
             ID       Date  Fate  FstSurvey   LstAlive
    1  BHS_1149 2017-04-11   MIA 2017-04-11       <NA>
    2  BHS_1154       <NA>  <NA>       <NA>       <NA>
    3  BHS_1155       <NA>  <NA>       <NA>       <NA>
    4  BHS_1156       <NA>  <NA>       <NA>       <NA>
    5  BHS_1157       <NA>  Mort       <NA>       <NA>
    6  BHS_1159 2017-04-11 Alive 2017-04-11 2017-04-11
    7  BHS_1169 2017-04-11 Alive 2017-04-11 2017-04-11
    8  BHS_1259       <NA>  <NA>       <NA>       <NA>
    9  BHS_1260       <NA>  <NA>       <NA>       <NA>
    10 BHS_1262 2017-04-11   MIA 2017-04-11 2017-12-06
    11 BHS_1262 2017-07-05 Alive 2017-04-11 2017-12-06
    12 BHS_1262 2017-12-06 Alive 2017-04-11 2017-12-06
    13 BHS_1262 2017-12-06   MIA 2017-04-11 2017-12-06
    14 BHS_1262 2018-01-17  Mort 2017-04-11 2017-12-06
    
    Warning messages:
    1: In max.default(numeric(0), na.rm = FALSE) :
      no non-missing arguments to max; returning -Inf
    2: In max.default(numeric(0), na.rm = FALSE) :
      no non-missing arguments to max; returning -Inf
    

    代码似乎按预期工作,但我无法勇敢地避免错误,也无法通过 max min

    dat <- structure(list(ID = c("BHS_1149", "BHS_1154", "BHS_1155", "BHS_1156", 
    "BHS_1157", "BHS_1159", "BHS_1169", "BHS_1259", "BHS_1260", "BHS_1262", 
    "BHS_1262", "BHS_1262", "BHS_1262", "BHS_1262"), Date = structure(c(1491890400, 
    NA, NA, NA, NA, 1491890400, 1491890400, NA, NA, 1491890400, 1499234400, 
    1512543600, 1512543600, 1516172400), class = c("POSIXct", "POSIXt"
    ), tzone = ""), Fate = c("MIA", NA, NA, NA, "Mort", "Alive", 
    "Alive", NA, NA, "MIA", "Alive", "Alive", "MIA", "Mort")), row.names = c(NA, 
    -14L), .Names = c("ID", "Date", "Fate"), class = "data.frame")
    
    1 回复  |  直到 6 年前
        1
  •  1
  •   davsjob    6 年前

    我也喜欢编写不会出错的代码。这里有一个关于如何在没有警告的情况下进行相同计算的建议。使用有序的 最后的 而不是 最大值 你不会得到奇怪的场景,r解释max(空)变成Inf。

    dat %>% 
      group_by(ID) %>%
      mutate(FstSurvey = first(Date, 
                         order_by = Date),
             LstAlive  = last(Date[Fate == "Alive"], 
                         order_by = Date[Fate == "Alive"]))