代码之家  ›  专栏  ›  技术社区  ›  C.Robin

映射到输出tibble中的突变爆炸行数

  •  0
  • C.Robin  · 技术社区  · 2 年前

    假设我有这样的数据:

    d <- tibble::tribble(
      ~sit_comfy_sofa_1, ~sit_comfy_sofa_2, ~sit_comfy_sofa_3, ~sit_comfy_sofa_4, ~sit_comfy_couch_1, ~sit_comfy_couch_2, ~sit_comfy_couch_3, ~sit_comfy_couch_4, ~sit_comfy_settee_1, ~sit_comfy_settee_2, ~sit_comfy_settee_3, ~sit_comfy_settee_4,
                     1L,                0L,                0L,                0L,                 0L,                 1L,                 0L,                 0L,                  0L,                  0L,                  1L,                  0L,
                     0L,                0L,                0L,                1L,                 0L,                 0L,                 0L,                 1L,                  0L,                  1L,                  0L,                  0L,
                     0L,                1L,                0L,                0L,                 1L,                 0L,                 0L,                 0L,                  1L,                  0L,                  0L,                  0L,
                     0L,                0L,                1L,                0L,                 0L,                 0L,                 1L,                 0L,                  0L,                  0L,                  0L,                  1L
      )
    

    此tibble有三个“类别”列,其中一个用于 _sofa_ ,一个用于 _couch_ ,和一个用于 _settee_ 。我试图查看每个类别,并构造一个新变量,该变量具有基于类别==1内的每个列是否为1的条件值。

    我编写此函数是为了尝试:

    cleaning_fcn <- function(.df, .x){
      .df %>% 
        mutate(!!sym(paste0("explain_", .x)) := case_when(
          !!sym(paste0("sit_comfy_", .x ,"_1")) == 1 ~ "Just better",
          !!sym(paste0("sit_comfy_", .x, "_2")) == 1 ~ "Nice shape",
          !!sym(paste0("sit_comfy_", .x ,"_3")) == 1 ~ "Like the color",
          !!sym(paste0("sit_comfy_", .x ,"_4")) == 1 ~ "Nice material"),
          !!sym(paste0("explain_", .x)) := factor(!!sym(paste0("explain_", .x)), 
                                                   levels = c("Just better", "Nice shape",
                                                              "Like the color", "Nice material")))
    }
    

    然而,当我称之为tibble时,我最终得到了一个tibble,它的行数是原始tibble的3倍。

    require(tidyverse)
    
    purrr::map_dfr(
        .x = tidyselect::all_of(c("sofa", "couch", "settee")),
        .f = ~ cleaning_fcn(.df = d, .x))
    

    有人能看出我哪里错了吗?

    从本质上讲,我想实现与下面代码相同的功能,但理想情况下,它将是一个函数(通常重复次数要少得多):

    d <- d %>% 
      mutate(explain_sofa = case_when(
        sit_comfy_sofa_1 == 1 ~ "Just better",
        sit_comfy_sofa_2 == 1 ~ "Nice shape",
        sit_comfy_sofa_3 == 1 ~ "Like the color",
        sit_comfy_sofa_4 == 1 ~ "Nice material"),
        explain_sofa = factor(explain_sofa, levels = c("Just better", "Nice shape",
                                                       "Like the color", "Nice material")))
    d <- d %>% 
      mutate(explain_couch = case_when(
        sit_couch_sofa_1 == 1 ~ "Just better",
        sit_couch_sofa_2 == 1 ~ "Nice shape",
        sit_couch_sofa_3 == 1 ~ "Like the color",
        sit_couch_sofa_4 == 1 ~ "Nice material"),
        explain_couch = factor(explain_couch, levels = c("Just better", "Nice shape",
                                                       "Like the color", "Nice material")))
    
    d <- d %>% 
      mutate(explain_settee = case_when(
        sit_settee_sofa_1 == 1 ~ "Just better",
        sit_settee_sofa_2 == 1 ~ "Nice shape",
        sit_settee_sofa_3 == 1 ~ "Like the color",
        sit_settee_sofa_4 == 1 ~ "Nice material"),
        explain_settee = factor(explain_settee, levels = c("Just better", "Nice shape",
                                                        "Like the color", "Nice material")))
    
    1 回复  |  直到 2 年前
        1
  •  1
  •   stefan    2 年前

    使用 map_dfr 您正在创建 list 数据帧,每个类别一个,然后按行绑定。因此,最终得到的数据帧的行数是原来的3倍。一种选择是使用 purrr::reduce 相反:

    library(tidyverse)
    
    purrr::reduce(.x = c("sofa", "couch", "settee"), .f = cleaning_fcn, .init = d)
    #> # A tibble: 4 × 15
    #>   sit_comfy_sofa_1 sit_comfy_sofa_2 sit_comfy_sofa_3 sit_comfy_sofa_4
    #>              <int>            <int>            <int>            <int>
    #> 1                1                0                0                0
    #> 2                0                0                0                1
    #> 3                0                1                0                0
    #> 4                0                0                1                0
    #> # ℹ 11 more variables: sit_comfy_couch_1 <int>, sit_comfy_couch_2 <int>,
    #> #   sit_comfy_couch_3 <int>, sit_comfy_couch_4 <int>, sit_comfy_settee_1 <int>,
    #> #   sit_comfy_settee_2 <int>, sit_comfy_settee_3 <int>,
    #> #   sit_comfy_settee_4 <int>, explain_sofa <fct>, explain_couch <fct>,
    #> #   explain_settee <fct>