代码之家  ›  专栏  ›  技术社区  ›  Konrad

在自定义dplyr函数中更改结果变量的名称

  •  3
  • Konrad  · 技术社区  · 7 年前

    出身背景

    dplyr 工作流中,我起草了一个简单的函数来生成所需的指标

    # Function to generate summary table
    generate_summary_tbl <- function(dataset, group_column, summary_column) {
        group_column   <- enquo(group_column)
        summary_column <- enquo(summary_column)
        dataset %>% 
            group_by(!!group_column) %>% 
            summarise(
                mean = mean(!!summary_column),
                sum  = sum(!!summary_column)
                # Other metrics that need to be generated frequently
            ) %>% 
            ungroup -> smryDta
        return(smryDta)
    }
    

    实例

    该功能根据需要工作:

    >> mtcars %>% 
    ...     generate_summary_tbl(group_column = am, summary_column = mpg)
    # A tibble: 2 x 3
         am     mean   sum
      <dbl>    <dbl> <dbl>
    1     0 17.14737 325.8
    2     1 24.39231 317.1
    

    我想, 有条件地 summary_column = mpg 在结果中。

    useColName = TRUE

    调用时 useColName=TRUE

    >> mtcars %>% 
    ...     generate_summary_tbl(group_column = am, summary_column = mpg,
                                 useColName = TRUE)
    # A tibble: 2 x 3
         am     mean_am   sum_am
      <dbl>    <dbl>       <dbl>
    1     0    17.14737    325.8
    2     1    24.39231    317.1
    

    _am mean_am

    丑陋的解决方案

    部分丑陋的解决方案我有用处 setNames :

    # Function to generate summary table
    generate_summary_tbl <-
        function(dataset,
                 group_column,
                 summary_column,
                 useColName = TRUE) {
            group_column   <- enquo(group_column)
            summary_column <- enquo(summary_column)
            dataset %>%
                group_by(!!group_column) %>%
                summarise(mean = mean(!!summary_column),
                          sum  = sum(!!summary_column)) %>%
                ungroup -> smryDta
    
            if (useColName) {
                setNames(smryDta,
                         c(deparse(substitute(
                             group_column
                         )),
                         paste(
                             names(smryDta)[2:length(smryDta)], paste0("_", deparse(substitute(
                                 group_column
                             )))
                         ))) -> smryDta
            }
    
            return(smryDta)
        }
    

    实例

    几乎 匹配所需结果。我想我可以使用一些正则表达式并达到预期的结果。然而,我认为应该有更有效的解决方案。

    mtcars %>% 
        generate_summary_tbl(group_column = am, summary_column = mpg, useColName = TRUE)
    # A tibble: 2 x 3
      `~am` `mean _~am` `sum _~am`
      <dbl>       <dbl>      <dbl>
    1     0    17.14737      325.8
    2     1    24.39231      317.1
    

    quo lazyeval ?

    1 回复  |  直到 7 年前
        1
  •  2
  •   lukeA    7 年前

    可能使用 rename :

    library(tidyverse)
    
    generate_summary_tbl <- function(dataset, group_column, summary_column, useColname = FALSE) {
        group_column   <- enquo(group_column)
        summary_column <- enquo(summary_column)
        dataset %>% 
            group_by(!!group_column) %>% 
            summarise(
                mean = mean(!!summary_column),
                sum  = sum(!!summary_column)
                # Other metrics that need to be generated frequently
            ) %>% 
            ungroup -> smryDta
    
        if (useColname) 
          smryDta <- smryDta %>%  
          rename_at(
            vars(-one_of(quo_name(group_column))), 
            ~paste(quo_name(group_column), .x, sep="_")
          )
    
        return(smryDta)
    }
    
    mtcars %>% generate_summary_tbl(am, mpg)
    # # A tibble: 2 x 3
    #      am     mean   sum
    #   <dbl>    <dbl> <dbl>
    # 1     0 17.14737 325.8
    # 2     1 24.39231 317.1
    mtcars %>% generate_summary_tbl(am, mpg, T)
    #   # A tibble: 2 x 3
    #      am  am_mean am_sum
    #   <dbl>    <dbl>  <dbl>
    # 1     0 17.14737  325.8
    # 2     1 24.39231  317.1