代码之家  ›  专栏  ›  技术社区  ›  jakes

如何找到满足条件的群的第一个元素

  •  3
  • jakes  · 技术社区  · 6 年前
    structure(list(group = c(17L, 17L, 17L, 18L, 18L, 18L, 18L, 19L, 
    19L, 19L, 20L, 20L, 20L, 21L, 21L, 22L, 23L, 24L, 25L, 25L, 25L, 
    26L, 27L, 27L, 27L, 28L), var = c(74L, 49L, 1L, 74L, 1L, 49L, 
    61L, 49L, 1L, 5L, 5L, 1L, 44L, 44L, 12L, 13L, 5L, 5L, 1L, 1L, 
    4L, 4L, 1L, 1L, 1L, 49L), first = c(0, 0, 1, 0, 1, 0, 0, 0, 1, 
    0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0)), .Names = c("group", 
    "var", "first"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
    -26L))
    

    对于前两列的数据,我想创建第三列(称为 first first == 1 var == 1 第一次在一个小组里。换言之,我想标记其中的第一个元素 group 那个全套的 变量==1 dplyr ? 当然 group_by

    3 回复  |  直到 6 年前
        1
  •  1
  •   Martin Morgan    6 年前

    first_equal_to = function(x, value)
        (x == value) & (cumsum(x == value) == 1)
    

    所以

    tbl %>% group_by(group) %>% mutate(first = first_equal_to(var, 1))
    

    (似乎应该将其作为逻辑向量,因为这是列所表示的内容)。

    first_equal_to2 = function(x, value) {
        result = logical(length(x))
        result[match(value, x)] = TRUE
        result
    }
    
        2
  •  2
  •   AntoniosK    6 年前
    library(dplyr)
    
    df$first = NULL
    
    df %>%
      group_by(group) %>%
      mutate(first = as.numeric(row_number() == min(row_number()[var == 1]))) %>%
      ungroup()
    
    # # A tibble: 26 x 3
    #   group   var first
    #   <int> <int> <dbl>
    # 1    17    74     0
    # 2    17    49     0
    # 3    17     1     1
    # 4    18    74     0
    # 5    18     1     1
    # 6    18    49     0
    # 7    18    61     0
    # 8    19    49     0
    # 9    19     1     1
    # 10   19     5     0
    # # ... with 16 more rows
    

    其思想是在 var =1,每组内。

    这将返回一些警告,因为在某些组中没有 变量 =1例。

    另一种选择是:

    library(dplyr)
    
    df$first = NULL
    
    # create row id
    df$id = seq_along(df$group)
    
    df %>%
      filter(var == 1) %>%                         # keep cases where var = 1
      distinct(group, .keep_all = T) %>%           # keep distinct cases based on group
      mutate(first = 1) %>%                        # create first column
      right_join(df, by=c("id","group","var")) %>% # join back original dataset
      mutate(first = coalesce(first, 0)) %>%       # replace NAs with 0
      select(-id)                                  # remove row id
    
    # # A tibble: 26 x 3
    #   group   var first
    #   <int> <int> <dbl>
    # 1    17    74     0
    # 2    17    49     0
    # 3    17     1     1
    # 4    18    74     0
    # 5    18     1     1
    # 6    18    49     0
    # 7    18    61     0
    # 8    19    49     0
    # 9    19     1     1
    #10    19     5     0
    # # ... with 16 more rows
    
        3
  •  1
  •   G. Grothendieck    6 年前

    我们可以使用所示的表达式 first :

    DF %>% 
      group_by(group) %>% 
      mutate(first = { var == 1 } %>% { . * !duplicated(.) } ) %>%
      ungroup
    

    给:

    # A tibble: 26 x 3
       group   var first
       <int> <int> <int>
     1    17    74     0
     2    17    49     0
     3    17     1     1
     4    18    74     0
     5    18     1     1
     6    18    49     0
     7    18    61     0
     8    19    49     0
     9    19     1     1
    10    19     5     0
    # ... with 16 more rows