代码之家  ›  专栏  ›  技术社区  ›  rg255

dplyr存储向量中的多变量

  •  1
  • rg255  · 技术社区  · 6 年前

    我正在对4x4方差协方差矩阵的后验分布执行特征分解。为此,我使用 eigen dplyr/tidyverse管道中的函数:

    set.seed(1)
    # Variance and covariances of 4 variables
    A1  <- rnorm(1000,10,1)
    A2  <- rnorm(1000,10,1)
    A3  <- rnorm(1000,10,1)
    A4  <- rnorm(1000,10,1)
    C12 <- rnorm(1000,0,1)
    C13 <- rnorm(1000,0,1)
    C14 <- rnorm(1000,0,1)
    C23 <- rnorm(1000,0,1)
    C24 <- rnorm(1000,0,1)
    C34 <- rnorm(1000,0,1)
    
    # Create posterior tibble
    w1_post <- as_tibble(cbind(A1, C12, C13, C14, A2, C23, C24, A3, C34, A4))
    
    # Get 1st-4th eigenvalues of each variance-covariance matrix
    w1_post %>%
      rowwise %>%
        mutate(
          eig1 = 
            eigen(matrix(c(A1, C12, C13, C14, C12, A2, C23, C24, C13, C23,
              A3, C34, C14, C24, C34, A4), nrow = 4))[[1]][1],
          eig2 = 
            eigen(matrix(c(A1, C12, C13, C14, C12, A2, C23, C24, C13, C23,
              A3, C34, C14, C24, C34, A4), nrow = 4))[[1]][2],
          eig3 = 
            eigen(matrix(c(A1, C12, C13, C14, C12, A2, C23, C24, C13, C23,
              A3, C34, C14, C24, C34, A4), nrow = 4))[[1]][3],
          eig4 = 
            eigen(matrix(c(A1, C12, C13, C14, C12, A2, C23, C24, C13, C23,
              A3, C34, C14, C24, C34, A4), nrow = 4))[[1]][4]) %>%
      select(starts_with('eig')) -> eig_post
    

    生产

    > eig_post
    Source: local data frame [1,000 x 4]
    Groups: <by row>
    
    # A tibble: 1,000 x 4
        eig1  eig2  eig3  eig4
       <dbl> <dbl> <dbl> <dbl>
     1  12.3 11.0  10.4   6.67
     2  12.8 10.1   9.19  7.61
     3  13.5 12.2   8.20  7.34
     4  12.7 12.2   8.91  7.68
     5  12.9  9.70  9.41  6.74
     6  12.2 10.6   8.62  7.70
     7  13.1 12.5   9.21  8.34
     8  12.9  9.76  7.87  6.96
     9  12.8 11.6   8.21  6.46
    10  12.5 11.6   9.85  8.13
    # ... with 990 more rows
    

    如您所见,这是每行执行四次特征分解-这比实际需要的多4倍,并减慢了我的脚本! eigen(*matrix*)[[1]][1:4] 跨越四个变量? 所以我需要得到上面的代码产生了什么,但是每行只做一个特征分解。我原以为这样行得通,但运气不好:

    w1_post %>%
      rowwise %>%
        mutate(c(eig1, eig2, eig3, eig4) = 
          eigen(matrix(c(A1, C12, C13, C14, C12, A2, C23, C24, C13, C23,
            A3, C34, C14, C24, C34, A4), nrow = 4))[[1]][1:4]) %>%
      select(starts_with('eig')) -> eig_post
    
    2 回复  |  直到 6 年前
        1
  •  1
  •   Bart VdW    6 年前

    通过先将计算存储为列表列,然后在后续步骤中提取值,可以避免计算特征分解4次。如果你想把它留在你的管道里,你可以这样做:

    eig_post <- w1_post %>%
      rowwise %>%
      mutate(
        pre_eig = list(eigen(matrix(c(A1, C12, C13, C14, C12, A2, C23, C24, C13, C23,
                         A3, C34, C14, C24, C34, A4), nrow = 4)))
      ) %>%
      mutate( 
        eig1 = pre_eig[[1]][1], 
        eig2 = pre_eig[[1]][2], 
        eig3 = pre_eig[[1]][3], 
        eig4 = pre_eig[[1]][4]) %>%
      select(starts_with("eig"))
    
        2
  •  1
  •   Artem Sokolov    6 年前

    这里有一个利用 purrr::map 功能系列:

    eig_post <- w1_post %>%
    
        ## Collapse columns into a vector
        transmute( x = pmap( list(A1, C12, C13, C14, C12, A2, C23, C24, C13, C23,
                                  A3, C34, C14, C24, C34, A4), c ) ) %>%
    
        ## Compose the 4x4 matrices from each vector
        mutate( mtx = map( x, matrix, nrow=4 ) ) %>%
    
        ## Perform a single decomposition and retrieve all 4 eigenvalues
        mutate( eig = map( mtx, ~eigen(.x)$values ) ) %>%
    
        ## Annotate the vector of eigenvalues with the desired names
        mutate( eig = map( eig, set_names, str_c("eig", 1:4) ) ) %>%
    
        ## Reshape the data frame by effectively unnesting the vector
        with( invoke( bind_rows, eig ) )