我想用dplyr延迟分组数据中的变量。我用
lag
dplyr lag function returns NAs
有人指出
https://github.com/tidyverse/dplyr/issues/1540
哈雷在2016年修正了一些错误。所以,我想已经解决了。为什么我的延迟命令仍然抛出NA?
library(tidyverse)
data = data.frame(id=c(1,1,1,2,2,2,3,3,3,4,4,4), time=seq(1:3), x=rep(c(5:8), each=3))
data %>%
group_by(id) %>%
mutate(x_lag = lag(x, n=1, default=NA, order_by=TRUE)) %>%
select(id, time, x, x_lag)
data %>%
group_by(id) %>%
mutate(x_lag = lag(x, n=1, default=NA, order_by=FALSE)) %>%
select(id, time, x, x_lag)
data %>%
group_by(id) %>%
arrange(id) %>%
mutate(x_lag = lag(x, n=1, default=NA, order_by=FALSE)) %>%
select(id, time, x, x_lag)
data %>%
group_by(id) %>%
mutate(x_lag = lag(x, n=1, default=0, order_by=TRUE)) %>%
select(id, time, x, x_lag)
# A tibble: 8 x 4
# Groups: id, time [12]
id time x x_lag
<dbl> <int> <int> <int>
1 1 1 5 NA
2 1 2 5 NA
3 1 3 5 NA
4 2 1 6 NA
5 2 2 6 NA
6 2 3 6 NA
7 3 1 7 NA
8 3 2 7 NA