代码之家  ›  专栏  ›  技术社区  ›  Henk

r根据不同列中范围内的值添加新列

  •  0
  • Henk  · 技术社区  · 6 年前

    我有一个data.table,有两个布尔列a和b。我想添加一个新的布尔行c,它依赖于a和b,但在前一行和上一行中“查找”有困难。

    我想定义C如下。如果在三行的范围内有一行a=1,并且至少有一行b=1,那么我希望C在该范围内所有其他行上成为c=1,其中a=1,c=0。否则c应为c=b。

    如果两个范围重叠并且都包含b=1,那么在两行中,c应变为c=1,而在其他行中,a=1和c=0。更多说明:

    df <- data.table(A=c(0,0,0,1,0,0,0,0,0,0,0,1,1,0,0), 
                     B=c(0,1,0,0,0,1,0,1,1,0,0,0,0,0,1))
    
        A B                                        A B C
    1:  0 0 #                                  1:  0 0 0
    2:  0 1 #                                  2:  0 1 0
    3:  0 0 #                                  3:  0 0 0
    4:  1 0 # range of three                   4:  1 0 1
    5:  0 0 #                                  5:  0 0 0
    6:  0 1 #                                  6:  0 1 0
    7:  0 0 #                                  7:  0 0 0
    8:  0 1                                    8:  0 1 1 # C = B
    9:  0 1 #                                  9:  0 1 0
    10: 0 0 ##                                 10: 0 0 0
    11: 0 0 ##                                 11: 0 0 0
    12: 1 0 ## overlapping range of three      12: 1 0 1
    13: 1 0 ##                                 13: 1 0 1
    14: 0 0 ##                                 14: 0 0 0
    15: 0 1 ##                                 15: 0 1 0
    

    我该怎么做呢,我对这个有点摸不着头脑。

    2 回复  |  直到 6 年前
        1
  •  3
  •   IceCreamToucan    6 年前
    # Find ranges where A == 1
    ind <- lapply(which(df$A == 1)
                  , function(i){s <- i + -3:3; s[s %in% seq(nrow(df))]})
    # Remove ranges with no B == 1
    good <- sapply(ind, function(i) df[i, any(B == 1)])
    ind  <- unique(unlist(ind[good]))
    # Assign C as described
    df[, C := B]
    df[ind, C := as.numeric(A == 1)]
    df
    #     A B C
    #  1: 0 0 0
    #  2: 0 1 0
    #  3: 0 0 0
    #  4: 1 0 1
    #  5: 0 0 0
    #  6: 0 1 0
    #  7: 0 0 0
    #  8: 0 1 1
    #  9: 0 1 0
    # 10: 0 0 0
    # 11: 0 0 0
    # 12: 1 0 1
    # 13: 1 0 1
    # 14: 0 0 0
    # 15: 0 1 0
    

    df 测向

    df <- data.table(A=c(0,0,0,1,0,0,0,0,0,0,0,0,1,0,0), 
                     B=c(0,1,0,0,0,1,0,1,1,0,0,0,0,0,0))
    
    df[12, A := 1]
    df[15, B := 1]
    
    df
    
    #     A B
    #  1: 0 0
    #  2: 0 1
    #  3: 0 0
    #  4: 1 0
    #  5: 0 0
    #  6: 0 1
    #  7: 0 0
    #  8: 0 1
    #  9: 0 1
    # 10: 0 0
    # 11: 0 0
    # 12: 1 0
    # 13: 1 0
    # 14: 0 0
    # 15: 0 1
    
        2
  •  2
  •   Melissa Key    6 年前

    以下是基于TidyVerse软件包套件的解决方案:

    A1 A = 1 C1 A=1 B = 1 窗户上的任何地方。

    library(tidyverse)
    df %>% 
      mutate(
        A1 = (cumsum(lead(A, 3, default = 0)) - cumsum(dplyr::lag(A, 4, default = 0)) > 0),
        C1 = (A & dplyr::lead(cumsum(B), n = 3, default = 0) - dplyr::lag(cumsum(B), n = 4, default = 0)) * 1,
        C = ifelse(!A1, B, C1)
      ) %>%
      select(-A1, -C1)