代码之家  ›  专栏  ›  技术社区  ›  steve zissou

R: 是否有清除因子水平的功能?数据框中按列显示的字符?[已关闭]

  •  -5
  • steve zissou  · 技术社区  · 7 年前

    我经常使用 clean_names() 来自的函数 janitor 包裹我想知道是否存在类似的函数来清除由因子名称或字符组成的列条目?

    我所说的“clean”,是指它应该做与看门人软件包中的clean\u names()函数相同的事情,例如将空格改为下划线、将大写改为小写、删除句号等。。。

    谢谢

    1 回复  |  直到 7 年前
        1
  •  4
  •   hrbrmstr    7 年前

    只需使用 janitor::clean_names() :

    # #' 'Clean' a character/factor vector like `janitor::clean_names()` does for data frame columns
    # #'
    # #' Most of the internals are from `janitor::clean_names()`
    # #'
    # #' @param x a vector of strings or factors
    # #' @param refactor if `x` is a factor, return a ref-factored factor?
    # #'        Default: `FALSE` == return character vector.
    clean_vec <- function (x, refactor=FALSE) {
    
      require(magrittr, quietly=TRUE)
    
      if (!(is.character(x) || is.factor(x))) return(x)
    
      x_is_factor <- is.factor(x)
    
      old_names <- as.character(x)
    
      new_names <- old_names %>%
        gsub("'", "", .) %>%
        gsub("\"", "", .) %>%
        gsub("%", "percent", .) %>%
        gsub("^[ ]+", "", .) %>%
        make.names(.) %>%
        gsub("[.]+", "_", .) %>%
        gsub("[_]+", "_", .) %>%
        tolower(.) %>%
        gsub("_$", "", .)
    
      dupe_count <- sapply(1:length(new_names), function(i) {
        sum(new_names[i] == new_names[1:i])
      })
    
      new_names[dupe_count > 1] <- paste(
        new_names[dupe_count > 1], dupe_count[dupe_count > 1], sep = "_"
      )
    
      if (x_is_factor && refactor) factor(new_names) else new_names
    
    }
    

    例子:

    vec <- stringi::stri_rand_strings(10, 10, pattern = "[A-Za-z0-9\\.\\-\\?_\\,\\*\\+]")
    
    vec
    ##  [1] "TzMF-iCHX6" "v-b+2cpul5" "JPMwpP35K6" "5Z3RQf50Tb" "HaPzKB5jhH"
    ##  [6] "3gz6P4?0uU" "ofXkhP4Q1O" "?,4NvCjw,3" "AlG9dWJ,Ze" "MrPrvuYH4*"
    
    clean_vec(vec)
    ##  [1] "tzmf_ichx6"  "v_b_2cpul5"  "jpmwpp35k6"  "x5z3rqf50tb" "hapzkb5jhh" 
    ##  [6] "x3gz6p4_0uu" "ofxkhp4q1o"  "x_4nvcjw_3"  "alg9dwj_ze"  "mrprvuyh4"
    
    clean_vec(factor(vec))
    ##  [1] "tzmf_ichx6"  "v_b_2cpul5"  "jpmwpp35k6"  "x5z3rqf50tb" "hapzkb5jhh" 
    ##  [6] "x3gz6p4_0uu" "ofxkhp4q1o"  "x_4nvcjw_3"  "alg9dwj_ze"  "mrprvuyh4"
    
    clean_vec(factor(vec), TRUE)
    ##  [1] tzmf_ichx6  v_b_2cpul5  jpmwpp35k6  x5z3rqf50tb hapzkb5jhh 
    ##  [6] x3gz6p4_0uu ofxkhp4q1o  x_4nvcjw_3  alg9dwj_ze  mrprvuyh4  
    ## 10 Levels: alg9dwj_ze hapzkb5jhh jpmwpp35k6 mrprvuyh4 ... x5z3rqf50tb