代码之家  ›  专栏  ›  技术社区  ›  jay.sf

如何检查变量元组是否导致第二个元组列表中的匹配?

  •  1
  • jay.sf  · 技术社区  · 5 年前

    d1 在某些地点进行观察 l 和时代 t .

    > head(d1, 3)
      id   l    p          t         X
    1  1 258 2016 2016-01-05 -1.158644
    2  5 261 2016 2016-01-14  1.604873
    3  2 261 2016 2016-01-20 -1.102002
    

    在另一个数据帧中 p2 我有时间间隔 t1:t2 地点 L ,我想逐行检查 d1 匹配的位置和时间间隔元组 p2

    > head(p2, 3)
        l    p         t1         t2
    1 261 2016 2016-01-11 2016-01-25
    2 261 2017 2017-02-27 2017-03-13
    3 261 2017 2017-03-01 2017-03-15
    

    在正的情况下,虚拟变量 d1$match 应给出值1,在负值情况下为0:

    # [1] 0 1 1 ...
    

    L p 将两个数据帧转换为字符串并进行比较,然后检查 在于 t1:t2 .

    p1 . 此外,还发出了警告,因为车辆似乎存在问题 "Date" 上课。

    > p1
        l    p         t1         t2
    1 261 2016 2016-01-11 2016-01-25
    2 261 2017 2017-02-27 2017-03-13
    4 258 2018 2018-01-09 2018-01-23
    
    p <- p1
    p.strg <- sapply(1:nrow(p), function(x) {
      do.call(paste, c(p[x, c("l", "p")], sep = "|"))
    })
    
    sapply(1:nrow(d1), function(x) {
      strg <- do.call(paste, c(d1[x, c("l", "p")], sep = "|"))
      t.d <- d1[x, "t"]
      t.p <- p[which(p.strg %in% strg), c("t1", "t2")]
      return(as.integer(any(p.strg %in% strg) & t.d >= t.p[1] &
                          t.d <= t.p[2]))
    })
    
    # [1] 0 1 1 0 0 0 1 1 0 0 0 1 0 0 0
    # There were 30 warnings (use warnings() to see them)
    # warnings()
    # Warning messages:
    #   1: In FUN(X[[i]], ...) :
    #   Incompatible methods ("Ops.Date", "Ops.data.frame") for ">="
    #   ...
    

    如果周期确实如中所示重叠 p2

    p <- p2
    p.strg <- sapply(1:nrow(p), function(x) {
      do.call(paste, c(p[x, c("l", "p")], sep = "|"))
    })
    sapply(1:nrow(d1), function(x) {
      strg <- do.call(paste, c(d1[x, c("l", "p")], sep = "|"))
      t.d <- d1[x, "t"]
      t.p <- p[which(p.strg %in% strg), c("t1", "t2")]
      return(as.integer(any(p.strg %in% strg) & t.d >= t.p[1] &
                          t.d <= t.p[2]))
    })
    

    这根本不起作用:

    Error in FUN(X[[i]], ...) : 
      (list) object cannot be coerced to type 'double'
    In addition: There were 13 warnings (use warnings() to see them)
    

    我想我有点迷路了。解决这一问题的更好办法是什么 ?

    注: 我的原始数据比较广泛(d1:20000 x 11,p2:1700 x 8),所以我需要一个高效的解决方案。


    d1 <- structure(list(id = c(1L, 5L, 2L, 3L, 1L, 3L, 4L, 5L, 2L, 3L, 
    5L, 1L, 2L, 4L, 4L), l = c(258, 261, 261, 260, 258, 260, 261, 
    261, 259, 260, 261, 258, 259, 261, 261), p = c(2016, 2016, 2016, 
    2016, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2018, 2018, 2018, 
    2018), t = structure(c(16805, 16814, 16820, 16924, 17193, 17211, 
    17227, 17229, 17348, 17481, 17517, 17543, 17554, 17787, 17887
    ), class = "Date"), X = c(-1.15864442153663, 1.60487335898257, 
    -1.10200153102672, -0.823719007033067, 1.20944271845298, 0.790388149166713, 
    -1.0996495357495, -0.421449225963478, -0.243567712934607, -0.337415580767635, 
    -1.64590022554026, 2.11206142393207, -0.950235138478342, -2.08164602167738, 
    -1.88576409729638), match = c(0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 
    0L, 0L, 0L, 1L, 0L, 0L, 0L)), row.names = c(NA, -15L), class = "data.frame")
    
    p1 <- structure(list(l = c(261, 261, 258), p = c(2016, 2017, 2018), 
        t1 = structure(c(16811, 17224, 17540), class = "Date"), t2 = structure(c(16825, 
        17238, 17554), class = "Date")), row.names = c(1L, 2L, 4L
    ), class = "data.frame")
    
    p2 <- structure(list(l = c(261, 261, 261, 258, 259, 261), p = c(2016, 
    2017, 2017, 2018, 2018, 2018), t1 = structure(c(16811, 17224, 
    17226, 17540, 17551, 17884), class = "Date"), t2 = structure(c(16825, 
    17238, 17240, 17554, 17565, 17898), class = "Date")), row.names = c(NA, 
    -6L), class = "data.frame")
    
    1 回复  |  直到 5 年前
        1
  •  1
  •   s_baldur    5 年前

    以下是使用循环的原始解决方案:

    d1[["match"]] <- 0L
    for (i in seq_len(nrow(d1))) {
      p2rows <- which(p2[["l"]] == d1[["l"]][i])
      for (r in p2rows) { # If no location match, there will be nothing to loop over
        if (d1[["t"]][i] >= with(p2[r,], t1) && d1[["t"]][i] <= with(p2[r,], t2)) {
          d1[["match"]][i] <- 1L
          break # Enough to find one match, we break out of the inner loop
        }
      }
    }
    
       id   l    p          t          X match
    1   1 258 2016 2016-01-05 -1.1586444     0
    2   5 261 2016 2016-01-14  1.6048734     1
    3   2 261 2016 2016-01-20 -1.1020015     1
    4   3 260 2016 2016-05-03 -0.8237190     0
    5   1 258 2017 2017-01-27  1.2094427     0
    6   3 260 2017 2017-02-14  0.7903881     0
    7   4 261 2017 2017-03-02 -1.0996495     1
    8   5 261 2017 2017-03-04 -0.4214492     1
    9   2 259 2017 2017-07-01 -0.2435677     0
    10  3 260 2017 2017-11-11 -0.3374156     0
    11  5 261 2017 2017-12-17 -1.6459002     0
    12  1 258 2018 2018-01-12  2.1120614     1
    13  2 259 2018 2018-01-23 -0.9502351     1
    14  4 261 2018 2018-09-13 -2.0816460     0
    15  4 261 2018 2018-12-22 -1.8857641     1
    

    编辑

    for (i in seq_len(nrow(d1))) {
      p2rows <- which(p2[["l"]] == d1[["l"]][i])
      if (any(d1[["t"]][i] >= with(p2[p2rows,], t1) & d1[["t"]][i] <= with(p2[p2rows,], t2))) {
        d1[["match"]][i] <- 1L
      }
    }
    

    编辑2 :同样是应该稍微快一点的:

    library(data.table)
    sapply(
      seq_len(nrow(d1)), 
      function(i) {
        p2rows <- which(p2[["l"]] == d1[["l"]][i])
        as.integer(any(between(d1[["t"]][i], p2[p2rows, "t1"], p2[p2rows, "t2"])))
      }
    )