代码之家 › 专栏 › 技术社区 › Henk

r根据不同列中范围内的值添加新列

data.table r

Henk · 技术社区 · 6 年前

我有一个data.table,有两个布尔列a和b。我想添加一个新的布尔行c,它依赖于a和b,但在前一行和上一行中“查找”有困难。

我想定义C如下。如果在三行的范围内有一行a=1,并且至少有一行b=1,那么我希望C在该范围内所有其他行上成为c=1,其中a=1,c=0。否则c应为c=b。

如果两个范围重叠并且都包含b=1,那么在两行中,c应变为c=1,而在其他行中,a=1和c=0。更多说明:

df <- data.table(A=c(0,0,0,1,0,0,0,0,0,0,0,1,1,0,0), 
                 B=c(0,1,0,0,0,1,0,1,1,0,0,0,0,0,1))

    A B                                        A B C
1:  0 0 #                                  1:  0 0 0
2:  0 1 #                                  2:  0 1 0
3:  0 0 #                                  3:  0 0 0
4:  1 0 # range of three                   4:  1 0 1
5:  0 0 #                                  5:  0 0 0
6:  0 1 #                                  6:  0 1 0
7:  0 0 #                                  7:  0 0 0
8:  0 1                                    8:  0 1 1 # C = B
9:  0 1 #                                  9:  0 1 0
10: 0 0 ##                                 10: 0 0 0
11: 0 0 ##                                 11: 0 0 0
12: 1 0 ## overlapping range of three      12: 1 0 1
13: 1 0 ##                                 13: 1 0 1
14: 0 0 ##                                 14: 0 0 0
15: 0 1 ##                                 15: 0 1 0

我该怎么做呢,我对这个有点摸不着头脑。

2 回复 | 直到 6 年前

IceCreamToucan 6 年前

# Find ranges where A == 1
ind <- lapply(which(df$A == 1)
              , function(i){s <- i + -3:3; s[s %in% seq(nrow(df))]})
# Remove ranges with no B == 1
good <- sapply(ind, function(i) df[i, any(B == 1)])
ind  <- unique(unlist(ind[good]))
# Assign C as described
df[, C := B]
df[ind, C := as.numeric(A == 1)]
df
#     A B C
#  1: 0 0 0
#  2: 0 1 0
#  3: 0 0 0
#  4: 1 0 1
#  5: 0 0 0
#  6: 0 1 0
#  7: 0 0 0
#  8: 0 1 1
#  9: 0 1 0
# 10: 0 0 0
# 11: 0 0 0
# 12: 1 0 1
# 13: 1 0 1
# 14: 0 0 0
# 15: 0 1 0

df 测向

df <- data.table(A=c(0,0,0,1,0,0,0,0,0,0,0,0,1,0,0), 
                 B=c(0,1,0,0,0,1,0,1,1,0,0,0,0,0,0))

df[12, A := 1]
df[15, B := 1]

df

#     A B
#  1: 0 0
#  2: 0 1
#  3: 0 0
#  4: 1 0
#  5: 0 0
#  6: 0 1
#  7: 0 0
#  8: 0 1
#  9: 0 1
# 10: 0 0
# 11: 0 0
# 12: 1 0
# 13: 1 0
# 14: 0 0
# 15: 0 1

Melissa Key 6 年前

以下是基于TidyVerse软件包套件的解决方案:

A1 A = 1 C1 A=1 B = 1 窗户上的任何地方。

library(tidyverse)
df %>% 
  mutate(
    A1 = (cumsum(lead(A, 3, default = 0)) - cumsum(dplyr::lag(A, 4, default = 0)) > 0),
    C1 = (A & dplyr::lead(cumsum(B), n = 3, default = 0) - dplyr::lag(cumsum(B), n = 4, default = 0)) * 1,
    C = ifelse(!A1, B, C1)
  ) %>%
  select(-A1, -C1)

推荐文章

Marc B. · 使用ggplot2创建条形图时“缺少值”

1 年前

deschen · tidyverse与外部向量发生突变,该外部向量的元素是数据帧中的列值

1 年前

Laura · 在Shiny中使用可排序的包拖放名称,这些名称将成为图表

1 年前

Mallikarjun M · 如何使用随机森林进行时间序列预测?

1 年前

ly li · 模型摘要:当表格形状改变时,拟合优度消失

1 年前

C.Robin · 将marginaffects::predictions()的结果连接回main df?

1 年前

monotonic · 如何将格式为“col1+col3+col4”的数据帧的行名转换为一列数字向量“c(1,3,4)”?

2 年前

Shawn Hemelstrand · 为什么我的自定义errorbar函数不能在R中工作?

2 年前

RoyBatty · 统计每个字符在整个数据集中出现的次数

2 年前

stats_noob · R: 记录某个“行为”发生的循环的索引?

2 年前