代码之家  ›  专栏  ›  技术社区  ›  Thomas Speidel

Dplyr根据条件总结多个列

  •  1
  • Thomas Speidel  · 技术社区  · 6 年前

    我有这样一个数据集:

    df.in <-structure(list(id = c(1, 1, 2, 3), x1 = c(0, 1, NA, 0), x2 = c("Lorem ipsum dolor sit amet", 
                                                                        "dolore eu fugiat nulla pariatur", "Sed ut perspiciatis unde omnis", 
                                                                        "Nemo enim ipsam voluptatem"), x3 = c("Donec ullamcorper elit quis risus", 
                                                                                                              "Donec ullamcorper elit quis risus", "Curabitur euismod", "Mauris felis orci"
                                                                        )), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
                                                                        ))
    
    > df.in
    # A tibble: 4 x 4
         id    x1 x2                              x3                               
      <dbl> <dbl> <chr>                           <chr>                            
    1     1     0 Lorem ipsum dolor sit amet      Donec ullamcorper elit quis risus
    2     1     1 dolore eu fugiat nulla pariatur Donec ullamcorper elit quis risus
    3     2    NA Sed ut perspiciatis unde omnis  Curabitur euismod                
    4     3     0 Nemo enim ipsam voluptatem      Mauris felis orci 
    


    dplyr::group_by() 要获得此信息:

    df.out <- structure(list(id = c(1, 2, 3), x1 = c(1, NA, 0), x2 = c("dolore eu fugiat nulla pariatur", 
                                                                       "Sed ut perspiciatis unde omnis", "Nemo enim ipsam voluptatem"
    ), x3 = c("Donec ullamcorper elit quis risus", "Curabitur euismod", 
              "Mauris felis orci")), row.names = c(NA, -3L), class = c("tbl_df", 
                                                                       "tbl", "data.frame"))
    
    > df.out
    # A tibble: 3 x 4
         id    x1 x2                              x3                               
      <dbl> <dbl> <chr>                           <chr>                            
    1     1     1 dolore eu fugiat nulla pariatur Donec ullamcorper elit quis risus
    2     2    NA Sed ut perspiciatis unde omnis  Curabitur euismod                
    3     3     0 Nemo enim ipsam voluptatem      Mauris felis orci  
    


    我能做到:

    df.in %>%
      group_by(id) %>%
      summarise(x1 = max(x1))
    


    但是,如何:

    1. x2 , x3 保持价值在哪里 max(x1)
    2. 我有好几个 x summarize_all ?
    1 回复  |  直到 6 年前
        1
  •  1
  •   akrun    6 年前

    我们可以用 max summarise_at

    library(dplyr)
    df.in %>% 
      group_by(id) %>% 
      summarise_at(3:4, funs(if(n() == 1) . else .[x1 == max(x1, na.rm = TRUE)]))
    

    而不是使用 总结 filter slice

    df.in %>%
      group_by(id) %>% 
      filter((n() == 1) | (x1 == max(x1, na.rm = TRUE)))
    

    df.in %>% 
      group_by(id) %>% 
      slice(which(n() == 1 | (x1 == max(x1, na.rm = TRUE)))[1])