代码之家  ›  专栏  ›  技术社区  ›  SteveS

在R数据框架中,每次观测提取因子水平?

  •  2
  • SteveS  · 技术社区  · 6 年前

    我有 dataframe R中的对象:

    dput(data_frame("n" = seq_len(10), "g" = sample(rep(factor(c("male", "female")), 5))))
    
    structure(list(n = 1:10, g = structure(c(2L, 2L, 1L, 1L, 1L, 
    1L, 1L, 2L, 2L, 2L), .Label = c("female", "male"), class = "factor")), .Names = c("n", 
    "g"), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
    ))
    

    现在我想加上第三列,放在g的水平,我想我的水平是 1:length(unique(g))

    我正在尝试使用:

    df %>% mutate(l = levels(g)))
    

    如果运气不好,请告诉我这里遗漏了什么?

    我想要的是:

    n g        l
    1 male     1
    2 female   2
    3 male     1
    4 female   2
    5 male     1
    ..
    
    1 回复  |  直到 6 年前
        1
  •  3
  •   MKR    6 年前

    有两种选择:

    # Store levels as list in new column
    dataframe %>% mutate(l = list(levels(g)))
    
    # Store levels as separate by ',' in new column
    dataframe %>% mutate(l = paste(levels(g), collapse=","))
    
    # Just a column with number
    dataframe %>% mutate(l = as.integer(g))
    
    # # A tibble: 10 x 3
    #       n g          l
    #   <int> <fctr> <int>
    # 1     1 male       2
    # 2     2 male       2
    # 3     3 female     1
    # 4     4 female     1
    # 5     5 female     1
    # 6     6 female     1
    # 7     7 female     1
    # 8     8 male       2
    # 9     9 male       2
    # 10    10 male       2
    

    @djv建议

    #
    df %>% mutate(l = paste(seq_along(levels(g)), collapse=","))