代码之家  ›  专栏  ›  技术社区  ›  Cina

使用dplyr中的distinct函数获取唯一值

  •  -2
  • Cina  · 技术社区  · 6 年前

    library(tibble)
    df <- tibble(ID = c(100000L, 100000L, 100000L, 100000L, 100001L, 100001L, 100001L, 100001L, 100002L, 100002L, 100002L, 100002L, 100003L, 100003L, 100003L), subject_result2 = c("OTHERPassedTerm1", "OTHERPassedTerm1", "OTHERPassedTerm1", "MATHPassedTerm1", "OTHERPassedTerm1", "OTHERPassedTerm1", "OTHERPassedTerm1", "OTHERFailedTerm1", "OTHERPassedTerm1", "OTHERPassedTerm1", "MATHPassedTerm1", "MATHFailedTerm1", "OTHERPassedTerm1", "MATHPassedTerm1", "OTHERPassedTerm1"))
    
    # A tibble: 15 x 2
           ID subject_result2 
        <int> <chr>           
     1 100000 OTHERPassedTerm1
     2 100000 OTHERPassedTerm1
     3 100000 OTHERPassedTerm1
     4 100000 MATHPassedTerm1 
     5 100001 OTHERPassedTerm1
     6 100001 OTHERPassedTerm1
     7 100001 OTHERPassedTerm1
     8 100001 OTHERFailedTerm1
     9 100002 OTHERPassedTerm1
    10 100002 OTHERPassedTerm1
    11 100002 MATHPassedTerm1 
    12 100002 MATHFailedTerm1 
    13 100003 OTHERPassedTerm1
    14 100003 MATHPassedTerm1 
    15 100003 OTHERPassedTerm1
    

    subject_result2 基于每个 ID . 类似于下面的内容,但此代码不起作用

    library(dplyr)
    df %>%
     group_by(ID) %>%
     distinct(subject_result2)
    

    你能解决我的问题吗?谢谢

    预期结果:

    #   <int> <chr>           
    #1 100000 OTHERPassedTerm1
    #2 100000 MATHPassedTerm1 
    #3 100001 OTHERPassedTerm1
    #4 100001 OTHERFailedTerm1
    #5 100002 OTHERPassedTerm1
    #6 100002 MATHPassedTerm1 
    #7 100002 MATHFailedTerm1 
    #8 100003 OTHERPassedTerm1
    #9 100003 MATHPassedTerm1 
    
    2 回复  |  直到 6 年前
        1
  •  3
  •   markus    6 年前

    你只需要这样做

    distinct(df)
    # A tibble: 9 x 2
    #      ID subject_result2 
    #   <int> <chr>           
    #1 100000 OTHERPassedTerm1
    #2 100000 MATHPassedTerm1 
    #3 100001 OTHERPassedTerm1
    #4 100001 OTHERFailedTerm1
    #5 100002 OTHERPassedTerm1
    #6 100002 MATHPassedTerm1 
    #7 100002 MATHFailedTerm1 
    #8 100003 OTHERPassedTerm1
    #9 100003 MATHPassedTerm1 
    
        2
  •  1
  •   Henry Cyranka    6 年前

    您可以做的一件事是计算ID和subject_result2组合的实例。

    new_df <- df %>%
              group_by(ID, subject_result2) %>%
              summarise(id = n()) %>%distinct() %>%
              select(-id)
    
    
    new_df