代码之家 › 专栏 › 技术社区 › Mike206

尝试在R中展开()时对重复键求和

tidyr dplyr r

Mike206 · 技术社区 · 7 年前

我正在尝试学习R,我决定通过构建一个东西来阅读我所在州在选举之夜发布的现场选举结果来解决这个问题。不幸的是,我在计算 Margin 用于贴图填充的值。我的州(WA)采用前2名的初选,这意味着在一些种族中,在11月的选举中,有两人属于同一个政党。这可能是太多的背景知识,但无论如何,这里有一个编码问题:

我有一个如下所示的数据框:

Dist    Party                       Votes
1       (Prefers Democratic Party)  124151
1       (Prefers Republican Party)  101428
2       (Prefers Democratic Party)  122173
2       (Prefers Republican Party)  79518
3       (Prefers Republican Party)  124796
3       (Prefers Democratic Party)  78018
4       (Prefers Republican Party)  75307
4       (Prefers Republican Party)  77772
5       (Prefers Republican Party)  135470
5       (Prefers Democratic Party)  87772
6       (Prefers Democratic Party)  141265
6       (Prefers Republican Party)  83025
7       (Prefers Democratic Party)  203954
7       (Prefers Republican Party)  47921
8       (Prefers Republican Party)  125741
8       (Prefers Democratic Party)  73003
9       (Prefers Democratic Party)  118132
9       (Prefers Republican Party)  48662
10      (Prefers Democratic Party)  99279
10      (Prefers Republican Party)  82213

我想让它看起来像这样:

Dist    (Prefers Democratic Party)  (Prefers Republican Party)
1       124151                      101428
2       122173                      79518
3       78018                       124796
4       [NA or 0]                   153079
5       87772                       135470
6       141265                      83025
7       203954                      47921
8       73003                       125741
9       118132                      48662
10      99279                       82213

spread() 由于中存在重复项,因此无法正常工作 Dist = 4 。我从其他一些问题中总结出了这一点,但我并不满意,我几乎肯定有更好的方法

library(tidyr)
library(dplyr)

CongressTidy %>%
  group_by(Dist) %>%
  mutate(GOPVotes = sum(ifelse(Party == "(Prefers Republican Party)", Votes, 0))) %>%
  mutate(DemVotes = sum(ifelse(Party == "(Prefers Democratic Party)", Votes, 0)))

返回以下内容:

Dist    Party                       Votes   GOPVotes    DemVotes
<fctr>  <fctr>                      <int>   <dbl>       <dbl>
1       (Prefers Democratic Party)  124151  101428      124151
1       (Prefers Republican Party)  101428  101428      124151
2       (Prefers Democratic Party)  122173  79518       122173
2       (Prefers Republican Party)  79518   79518       122173
3       (Prefers Republican Party)  124796  124796      78018
3       (Prefers Democratic Party)  78018   124796      78018
4       (Prefers Republican Party)  75307   153079      0
4       (Prefers Republican Party)  77772   153079      0
5       (Prefers Republican Party)  135470  135470      87772
5       (Prefers Democratic Party)  87772   135470      87772
6       (Prefers Democratic Party)  141265  83025       141265
6       (Prefers Republican Party)  83025   83025       141265
7       (Prefers Democratic Party)  203954  47921       203954
7       (Prefers Republican Party)  47921   47921       203954
8       (Prefers Republican Party)  125741  125741      73003
8       (Prefers Democratic Party)  73003   125741      73003
9       (Prefers Democratic Party)  118132  48662       118132
9       (Prefers Republican Party)  48662   48662       118132
10      (Prefers Democratic Party)  99279   82213       99279
10      (Prefers Republican Party)  82213   82213       99279

就目前而言,这很好,我可以添加选择器列并通过它进行选择:

CongressMargins <- CongressTidy  %>%
  group_by(Dist) %>%
  mutate(GOPVotes = sum(ifelse(Party == "(Prefers Republican Party)", Votes, 0))) %>%
  mutate(DemVotes = sum(ifelse(Party == "(Prefers Democratic Party)", Votes, 0))) %>%
  mutate(selector = c(1,2)) %>%
  subset(selector == 1, select = c(Dist, GOPVotes, DemVotes))

这给了我想要的,我可以从那里很好地计算利润:

Dist    GOPVotes    DemVotes
<fctr>  <dbl>       <dbl>
1       101428      124151      
2       79518       122173      
3       124796      78018       
4       153079      0       
5       135470      87772       
6       83025       141265      
7       47921       203954      
8       125741      73003       
9       48662       118132      
10      82213       99279

但如果有两个没有对手的比赛,那就要搞砸了,因为这是基于病媒循环的。而且很难看。和 一定有更好的办法。 有什么想法吗?

2 回复 | 直到 7 年前

www 7 年前

我们可以先计算组和,然后进行分摊。如果希望缺少的单元格为0,请使用 spread(Party, Votes, fill = 0) 。

library(tidyverse)

dat2 <- dat %>%
  group_by(Dist, Party) %>%
  summarise(Votes = sum(Votes)) %>%
  spread(Party, Votes) %>%
  ungroup()
dat2
# # A tibble: 10 x 3
#     Dist `(Prefers Democratic Party)` `(Prefers Republican Party)`
#    <int>                        <int>                        <int>
#  1     1                       124151                       101428
#  2     2                       122173                        79518
#  3     3                        78018                       124796
#  4     4                           NA                       153079
#  5     5                        87772                       135470
#  6     6                       141265                        83025
#  7     7                       203954                        47921
#  8     8                        73003                       125741
#  9     9                       118132                        48662
# 10    10                        99279                        82213

数据

dat <- read.table(text = "Dist    Party                       Votes
1       '(Prefers Democratic Party)'  124151
                  1       '(Prefers Republican Party)'  101428
                  2       '(Prefers Democratic Party)'  122173
                  2       '(Prefers Republican Party)'  79518
                  3       '(Prefers Republican Party)'  124796
                  3       '(Prefers Democratic Party)'  78018
                  4       '(Prefers Republican Party)'  75307
                  4       '(Prefers Republican Party)'  77772
                  5       '(Prefers Republican Party)'  135470
                  5       '(Prefers Democratic Party)'  87772
                  6       '(Prefers Democratic Party)'  141265
                  6       '(Prefers Republican Party)'  83025
                  7       '(Prefers Democratic Party)'  203954
                  7       '(Prefers Republican Party)'  47921
                  8       '(Prefers Republican Party)'  125741
                  8       '(Prefers Democratic Party)'  73003
                  9       '(Prefers Democratic Party)'  118132
                  9       '(Prefers Republican Party)'  48662
                  10      '(Prefers Democratic Party)'  99279
                  10      '(Prefers Republican Party)'  82213",
                  header = TRUE, stringsAsFactors = FALSE)

Onyambu 7 年前

您可以使用 dcast 从…起 reshape2 将聚合函数指定为的包 sum

 library(reshape2)
 dcast(dat,Dist~Party,sum,value.var = "Votes")


   Dist (Prefers Democratic Party) (Prefers Republican Party)
1     1                     124151                     101428
2     2                     122173                      79518
3     3                      78018                     124796
4     4                          0                     153079
5     5                      87772                     135470
6     6                     141265                      83025
7     7                     203954                      47921
8     8                      73003                     125741
9     9                     118132                      48662
10   10                      99279                      82213

使用base R:

xtabs(Votes~Dist+Party,dat)
    Party
Dist (Prefers Democratic Party) (Prefers Republican Party)
  1                      124151                     101428
  2                      122173                      79518
  3                       78018                     124796
  4                           0                     153079
  5                       87772                     135470
  6                      141265                      83025
  7                      203954                      47921
  8                       73003                     125741
  9                      118132                      48662
  10                      99279                      82213

以上输出为类 table 您可以通过以下方式将其设置为数据帧:

as.data.frame.matrix(xtabs(Votes~Dist+Party,dat)) 现在这是一个数据帧,您可以按照您想要的方式进行子集