我正在尝试学习R,我决定通过构建一个东西来阅读我所在州在选举之夜发布的现场选举结果来解决这个问题。不幸的是,我在计算
Margin
用于贴图填充的值。我的州(WA)采用前2名的初选,这意味着在一些种族中,在11月的选举中,有两人属于同一个政党。这可能是太多的背景知识,但无论如何,这里有一个编码问题:
我有一个如下所示的数据框:
Dist Party Votes
1 (Prefers Democratic Party) 124151
1 (Prefers Republican Party) 101428
2 (Prefers Democratic Party) 122173
2 (Prefers Republican Party) 79518
3 (Prefers Republican Party) 124796
3 (Prefers Democratic Party) 78018
4 (Prefers Republican Party) 75307
4 (Prefers Republican Party) 77772
5 (Prefers Republican Party) 135470
5 (Prefers Democratic Party) 87772
6 (Prefers Democratic Party) 141265
6 (Prefers Republican Party) 83025
7 (Prefers Democratic Party) 203954
7 (Prefers Republican Party) 47921
8 (Prefers Republican Party) 125741
8 (Prefers Democratic Party) 73003
9 (Prefers Democratic Party) 118132
9 (Prefers Republican Party) 48662
10 (Prefers Democratic Party) 99279
10 (Prefers Republican Party) 82213
我想让它看起来像这样:
Dist (Prefers Democratic Party) (Prefers Republican Party)
1 124151 101428
2 122173 79518
3 78018 124796
4 [NA or 0] 153079
5 87772 135470
6 141265 83025
7 203954 47921
8 73003 125741
9 118132 48662
10 99279 82213
spread()
由于中存在重复项,因此无法正常工作
Dist = 4
。我从其他一些问题中总结出了这一点,但我并不满意,我几乎肯定有更好的方法
library(tidyr)
library(dplyr)
CongressTidy %>%
group_by(Dist) %>%
mutate(GOPVotes = sum(ifelse(Party == "(Prefers Republican Party)", Votes, 0))) %>%
mutate(DemVotes = sum(ifelse(Party == "(Prefers Democratic Party)", Votes, 0)))
返回以下内容:
Dist Party Votes GOPVotes DemVotes
<fctr> <fctr> <int> <dbl> <dbl>
1 (Prefers Democratic Party) 124151 101428 124151
1 (Prefers Republican Party) 101428 101428 124151
2 (Prefers Democratic Party) 122173 79518 122173
2 (Prefers Republican Party) 79518 79518 122173
3 (Prefers Republican Party) 124796 124796 78018
3 (Prefers Democratic Party) 78018 124796 78018
4 (Prefers Republican Party) 75307 153079 0
4 (Prefers Republican Party) 77772 153079 0
5 (Prefers Republican Party) 135470 135470 87772
5 (Prefers Democratic Party) 87772 135470 87772
6 (Prefers Democratic Party) 141265 83025 141265
6 (Prefers Republican Party) 83025 83025 141265
7 (Prefers Democratic Party) 203954 47921 203954
7 (Prefers Republican Party) 47921 47921 203954
8 (Prefers Republican Party) 125741 125741 73003
8 (Prefers Democratic Party) 73003 125741 73003
9 (Prefers Democratic Party) 118132 48662 118132
9 (Prefers Republican Party) 48662 48662 118132
10 (Prefers Democratic Party) 99279 82213 99279
10 (Prefers Republican Party) 82213 82213 99279
就目前而言,这很好,我可以添加选择器列并通过它进行选择:
CongressMargins <- CongressTidy %>%
group_by(Dist) %>%
mutate(GOPVotes = sum(ifelse(Party == "(Prefers Republican Party)", Votes, 0))) %>%
mutate(DemVotes = sum(ifelse(Party == "(Prefers Democratic Party)", Votes, 0))) %>%
mutate(selector = c(1,2)) %>%
subset(selector == 1, select = c(Dist, GOPVotes, DemVotes))
这给了我想要的,我可以从那里很好地计算利润:
Dist GOPVotes DemVotes
<fctr> <dbl> <dbl>
1 101428 124151
2 79518 122173
3 124796 78018
4 153079 0
5 135470 87772
6 83025 141265
7 47921 203954
8 125741 73003
9 48662 118132
10 82213 99279
但如果有两个没有对手的比赛,那就要搞砸了,因为这是基于病媒循环的。而且很难看。和
一定有更好的办法。
有什么想法吗?