代码之家  ›  专栏  ›  技术社区  ›  mt1022

分组数据时的不同结果。以不同方式使用数字索引的表列

  •  5
  • mt1022  · 技术社区  · 6 年前

    请参阅最简单的示例:

    library(data.table)
    DT <- data.table(x = 2, y = 3, z = 4)
    
    DT[, c(1:2)]  # first way
    #    x y
    # 1: 2 3
    
    DT[, (1:2)]  # second way
    # [1] 1 2
    
    DT[, 1:2]  # third way
    #    x y
    # 1: 2 3
    

    post ,现在可以使用数字索引对列进行子集设置。然而,我想知道为什么指数在 第二种方式 而不是列索引?

    此外,我更新了 data.table 刚才:

    > sessionInfo()
    R version 3.4.4 (2018-03-15)
    Platform: x86_64-pc-linux-gnu (64-bit)
    Running under: Ubuntu 16.04.4 LTS
    
    Matrix products: default
    BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
    LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
    
    locale:
     [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
     [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
     [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
    
    attached base packages:
    [1] stats     graphics  grDevices utils     datasets  methods   base     
    
    other attached packages:
    [1] data.table_1.11.2
    
    loaded via a namespace (and not attached):
    [1] compiler_3.4.4 tools_3.4.4    yaml_2.1.17
    
    1 回复  |  直到 6 年前
        1
  •  5
  •   David Arenburg Ulrik    6 年前

    通过查看 source code 我们可以模拟数据。不同输入的表行为

    if (!missing(j)) {
        jsub = replace_dot_alias(substitute(j))
        root = if (is.call(jsub)) as.character(jsub[[1L]])[1L] else ""
        if (root == ":" ||
            (root %chin% c("-","!") && is.call(jsub[[2L]]) && jsub[[2L]][[1L]]=="(" && is.call(jsub[[2L]][[2L]]) && jsub[[2L]][[2L]][[1L]]==":") ||
            ( (!length(av<-all.vars(jsub)) || all(substring(av,1L,2L)=="..")) &&
              root %chin% c("","c","paste","paste0","-","!") &&
              missing(by) )) {   # test 763. TODO: likely that !missing(by) iff with==TRUE (so, with can be removed)
          # When no variable names (i.e. symbols) occur in j, scope doesn't matter because there are no symbols to find.
          # If variable names do occur, but they are all prefixed with .., then that means look up in calling scope.
          # Automatically set with=FALSE in this case so that DT[,1], DT[,2:3], DT[,"someCol"] and DT[,c("colB","colD")]
          # work as expected.  As before, a vector will never be returned, but a single column data.table
          # for type consistency with >1 cases. To return a single vector use DT[["someCol"]] or DT[[3]].
          # The root==":" is to allow DT[,colC:colH] even though that contains two variable names.
          # root == "-" or "!" is for tests 1504.11 and 1504.13 (a : with a ! or - modifier root)
          # We don't want to evaluate j at all in making this decision because i) evaluating could itself
          # increment some variable and not intended to be evaluated a 2nd time later on and ii) we don't
          # want decisions like this to depend on the data or vector lengths since that can introduce
          # inconistency reminiscent of drop=TRUE in [.data.frame that we seek to avoid.
          with=FALSE
    

    大体上 "[.data.table" 捕获传递给的表达式 j 并根据一些预定义的规则决定如何处理它。如果满足其中一条规则,则设置 with=FALSE 这基本上意味着列名被传递给 J ,使用标准评估。

    规则(大致)如下:

    1. 设置 带=假 ,则,

      1.1。如果 J 表达式是调用,调用是 :

      1.2。如果通话是以下内容的组合 c("-","!") ( :

      1.3。如果某个值(字符、整数、数字等)或 .. 已传递给 J 电话接通了 c("","c","paste","paste0","-","!") 而且没有 by 呼叫

    否则设置 with=TRUE

    因此,我们可以将其转换为函数,并查看是否满足任何条件(我跳过了转换 . list 功能,因为它在这里是无关的。我们将只测试 列表 (直接)

    is_satisfied <- function(...) {
      jsub <- substitute(...)
      root = if (is.call(jsub)) as.character(jsub[[1L]])[1L] else ""
      if (root == ":" ||
        (root %chin% c("-","!") && 
         is.call(jsub[[2L]]) && 
         jsub[[2L]][[1L]]=="(" && 
         is.call(jsub[[2L]][[2L]]) && 
         jsub[[2L]][[2L]][[1L]]==":") ||
        ( (!length(av<-all.vars(jsub)) || all(substring(av,1L,2L)=="..")) &&
          root %chin% c("","c","paste","paste0","-","!"))) TRUE else FALSE
    }
    
    is_satisfied("x")
    # [1] TRUE
    is_satisfied(c("x", "y"))
    # [1] TRUE
    is_satisfied(..x)
    # [1] TRUE
    is_satisfied(1:2)
    # [1] TRUE
    is_satisfied(c(1:2))
    # [1] TRUE
    is_satisfied((1:2))
    # [1] FALSE
    is_satisfied(y)
    # [1] FALSE
    is_satisfied(list(x, y))
    # [1] FALSE