代码之家  ›  专栏  ›  技术社区  ›  Atanas Janackovski

如何以整洁的方式重新编码多个因子变量

  •  0
  • Atanas Janackovski  · 技术社区  · 5 年前

    我有许多变量,它们本质上是我想重新编码为整数的因素。

    许多变量是一个字符串,第一个字符是与 整数,例如。 2 = I have considered suicide in the past week, but not made any plans. 应该是 2 . 其他变量包括 yes no 应该是 1 0 分别是。

    none = 0
    one = 1
    two = 2
    three = 3
    four or more = 4
    

    同样地:

    ptsd = 0
    depression = 1
    generalised anxiety = 2
    no diagnosis warranted = 3
    

    Female = 0
    Male = 1
    Other = 2
    

    单元格中的一些值是 NA 必须保持 . 我尝试了以下代码,但没有尝试更改所有变量(以简单方式开始):

    vars1 <- vars(pastpsyc, pastmed, hxsuicide)
    vars2 <- vars(siss, mssi_1)
    
    df_rc <- df %>%
        ## this works
        mutate_at(vars1, ~ (case_when(
            . == "yes" ~ 1,
            . == "no" ~ 0
        ))) %>%
        ## this does not
        mutate_at(vars2, ~as.integer(str_extract(vars2, "[0-9]"))) %>%
        ## nor does this
        mutate_at(diag1, ~ (case_when(
            . == "ptsd" ~ 0,
            . == "depression" ~ 1,
            . == "generalised anxiety" ~ 2,
            . == "no diagnosis warranted" ~ 3
        )))
    

    但这失败了,我完全困惑于如何重新编码其他变量。

    如何将不同的字符串更改为所需的格式(最好以整洁的方式)?下面是一个可重复性最低的数据集。

    structure(list(siss = c("2 = I have considered suicide in the past week, but not made any plans.",
    "1 = I have had vague thoughts of suicide in the past week.",
    "2 = I have considered suicide in the past week, but not made any plans.",
    "1 = I have had vague thoughts of suicide in the past week.",
    "3 = I have made plans to suicide in the past week, but I haven’t intended to act on these plans."
    ), mssi_1 = c("1. Weak - unsure about whether he/she wants to die, seldom thinks about death, or intensity seems low.",
    "1. Weak - unsure about whether he/she wants to die, seldom thinks about death, or intensity seems low.",
    "1. Weak - unsure about whether he/she wants to die, seldom thinks about death, or intensity seems low.",
    "1. Weak - unsure about whether he/she wants to die, seldom thinks about death, or intensity seems low.",
    "2. Moderate - current desire to die, may be preoccupied with ideas about death, or intensity seems greater than a rating of 1."
    ), diag1 = c("ptsd", NA, "depression", "generalised anxiety",
    "no diagnosis warranted"), pastpsyc = c("yes", NA, "no", NA,
    "yes"), pastmed = c("no", "yes", NA, "no", "no"), hxsuicide = c("yes",
    NA, "yes", "yes", "yes"), suicide_attempts = c("none", NA, "one",
    "two", "four or more"), sex = c("Male", "Other", NA, "Female",
    NA)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA,
    -5L), spec = structure(list(cols = list(siss = structure(list(), class = c("collector_character",
    "collector")), mssi_1 = structure(list(), class = c("collector_character",
    "collector")), diag1 = structure(list(), class = c("collector_character",
    "collector")), pastpsyc = structure(list(), class = c("collector_character",
    "collector")), pastmed = structure(list(), class = c("collector_character",
    "collector")), hxsuicide = structure(list(), class = c("collector_character",
    "collector")), suicide_attempts = structure(list(), class = c("collector_character",
    "collector")), sex = structure(list(), class = c("collector_character",
    "collector"))), default = structure(list(), class = c("collector_guess",
    "collector")), skip = 1), class = "col_spec"))
    
    1 回复  |  直到 5 年前
        1
  •  1
  •   Ben    5 年前

    这是非常小的变化。

    • vars2 在里面 str_extract
    • 你想要的 vars(diag1) "diag1" (或者只是使用 mutate )改变这一列

    df %>%
      ## this works
      mutate_at(vars1, ~ (case_when(
        . == "yes" ~ 1,
        . == "no" ~ 0
      ))) %>%
      ## this does not
      mutate_at(vars2, ~as.integer(str_extract(., "[0-9]"))) %>%
      ## nor does this
      mutate_at(vars(diag1), ~ (case_when(
        . == "ptsd" ~ 0,
        . == "depression" ~ 1,
        . == "generalised anxiety" ~ 2,
        . == "no diagnosis warranted" ~ 3
      )))
    

    如果你想用 变异 mutate_at :

    mutate(diag1 = case_when(
      diag1 == "ptsd" ~ 0,
      diag1 == "depression" ~ 1,
      diag1 == "generalised anxiety" ~ 2,
      diag1 == "no diagnosis warranted" ~ 3
    ))