代码之家  ›  专栏  ›  技术社区  ›  Hutch3232

使用dplyr汇总并保持相同的变量名

  •  5
  • Hutch3232  · 技术社区  · 6 年前

    我已经找到了这些数据。table和dplyr在尝试执行相同的操作时会产生不同的结果。我想使用dplyr语法,但让它以数据的方式计算。表有。用例是我想将小计添加到表中。要做到这一点,我需要对每个变量进行聚合,但要保持相同的变量名(在转换后的版本中)。数据表允许我对变量执行一些聚合并保持相同的名称。然后使用相同的变量进行另一次聚合。它将继续使用未翻译的版本。然而,Dplyr将使用转换后的版本。

    总结 文件说明:

    # Note that with data frames, newly created summaries immediately
    # overwrite existing variables
    mtcars %>%
      group_by(cyl) %>%
      summarise(disp = mean(disp), sd = sd(disp))
    

    这基本上就是我遇到的问题,但我想知道是否有一个很好的解决方法。我发现的一件事是将转换后的变量命名为其他名称 重命名 它在结尾,但这对我来说不是很好。如果有一个很好的方法做小计,那也很高兴知道。我环顾了一下这个网站,没有看到讨论过的确切情况。任何帮助都将不胜感激!

    这里我举了一个简单的例子,有一次是关于数据的。我想用这个简单的表并附加一个小计行,它是感兴趣的列(Total)的加权平均数。

    library(data.table)
    library(dplyr)
    
    dt <- data.table(Group = LETTERS[1:5],
                     Count = c(1000, 1500, 1200, 2000, 5000),
                     Total = c(50, 300, 600, 400, 1000))
    dt[, Count_Dist := Count/sum(Count)]
    dt[, .(Count_Dist = sum(Count_Dist), Weighted_Total = sum(Count_Dist*Total))]
    
    dt <- rbind(dt[, .(Group, Count_Dist, Total)],
          dt[, .(Group = "All", Count_Dist = sum(Count_Dist), Total = sum(Count_Dist*Total))])
    setnames(dt, "Total", "Weighted_Avg_Total")
    
    dt
    
    df <- data.frame(Group = LETTERS[1:5],
                     Count = c(1000, 1500, 1200, 2000, 5000),
                     Total = c(50, 300, 600, 400, 1000))
    
    df %>%
      mutate(Count_Dist = Count/sum(Count)) %>%
      summarize(Count_Dist = sum(Count_Dist),
                Weighted_Total = sum(Count_Dist*Total))
    
    df %>% 
      mutate(Count_Dist = Count/sum(Count)) %>%
      select(Group, Count_Dist, Total) %>% 
      rbind(df %>%
              mutate(Count_Dist = Count/sum(Count)) %>%
              summarize(Group = "All",
                        Count_Dist = sum(Count_Dist),
                        Total = sum(Count_Dist*Total))) %>% 
      rename(Weighted_Avg_Total = Total)
    

    再次感谢您的帮助!

    2 回复  |  直到 6 年前
        1
  •  3
  •   Jaap    6 年前

    一种可能的解决方案是跳过 mutate 步骤和使用 transmute 第一次 变异 / select -步骤并直接从原始变量计算所需变量,而无需为第二个变量创建中间变量 变异 -步骤:

    df %>% 
      transmute(Group, Count_Dist = Count/sum(Count), Weighted_Avg_Total = Total) %>% 
      bind_rows(df %>%
                  summarize(Group = "All",
                            Count_Dist = sum(Count/sum(Count)),
                            Weighted_Avg_Total = sum((Count/sum(Count))*Total)))
    

    其中给出:

      Group Count_Dist Weighted_Avg_Total
    1     A 0.09345794            50.0000
    2     B 0.14018692           300.0000
    3     C 0.11214953           600.0000
    4     D 0.18691589           400.0000
    5     E 0.46728972          1000.0000
    6   All 1.00000000           656.0748
    

    另一种可能的解决方案是改变计算新变量的顺序 dplyr 然后使用 选择 要将列顺序恢复为您最初想要的顺序,请执行以下操作:

    df %>% 
      mutate(Count_Dist = Count/sum(Count)) %>%
      select(Group, Count_Dist, Weighted_Avg_Total = Total) %>% 
      bind_rows(df %>%
                  mutate(Count_Dist = Count/sum(Count)) %>%
                  summarize(Group = "All",
                            Weighted_Avg_Total = sum(Count_Dist*Total),
                            Count_Dist = sum(Count_Dist)) %>% 
                  select(Group, Count_Dist, Weighted_Avg_Total))
    

    如果要包括 Count -专栏,您也可以这样做(根据我下面的评论):

    df %>% 
      transmute(Group = Group, Count_Dist = Count/sum(Count), Weighted_Avg_Total = Total, Count) %>% 
      bind_rows(df %>%
                  summarize(Group = "All",
                            Count_Dist = sum(Count/sum(Count)),
                            Weighted_Avg_Total = sum((Count/sum(Count))*Total),
                            Count = sum(Count)))
    
        2
  •  1
  •   MKR    6 年前

    一种替代方法是 mutate 两次计算偶数 Weighted_Total 和使用 sum 中该列的 summarize .

    df %>%
      mutate(Count_Dist = Count/sum(Count)) %>%
      mutate(Weighted_Total = Count_Dist*Total) %>%
      summarize(Count_Dist = sum(Count_Dist),
                Weighted_Total = sum(Weighted_Total))
    Result:
      Count_Dist Weighted_Total
    1          1     656.074766
    

    以及:

        df %>% 
          mutate(Count_Dist = Count/sum(Count)) %>%
          select(Group, Count_Dist, Total) %>% 
          rbind(df %>%
                  mutate(Count_Dist = Count/sum(Count)) %>%
                  mutate(Weighted_Total = Count_Dist*Total) %>%
                  summarize(Group = "All",
                            Count_Dist = sum(Count_Dist),
                            Total = sum(Weighted_Total))) %>% 
          rename(Weighted_Avg_Total = Total)
    
    Result:
    
          Group   Count_Dist Weighted_Avg_Total
        1     A 0.0934579439          50.000000
        2     B 0.1401869159         300.000000
        3     C 0.1121495327         600.000000
        4     D 0.1869158879         400.000000
        5     E 0.4672897196        1000.000000
        6   All 1.0000000000         656.074766