代码之家 › 专栏 › 技术社区 › monotonic

如何在R中对列进行滚动求和?

bioinformatics linear-algebra matrix r

monotonic · 技术社区 · 4 年前

roll_sum和许多其他方法(例如。 https://vandomed.github.io/moving_averages.html )仅用于对行求和。我有一个很大的矩阵,我没有足够的内存来转置它。有没有一种方法可以直接在列上执行roll_sum?

例如:

library(roll)

A=matrix(rnorm(10000),100)
roll_sum(A,3)

但我想跨栏目做这件事。

接下来,到目前为止,所有的方法都是在不使用多核处理的情况下实现的。有人能提供具有此功能的解决方案吗?

0 回复 | 直到 4 年前

Cole 4 年前

这是一个 rcpp 方法。

Rcpp::cppFunction("
NumericMatrix rcpp_column_roll(const NumericMatrix mat, const int n) {

  const int ncol = mat.ncol();
  const int nrow = mat.nrow();
  NumericMatrix out(nrow, ncol);
  std::fill( out.begin(), out.end(), NumericVector::get_na() ) ;

  
  for (int i = 0; i < nrow; i++) {
    NumericVector window(n);
    double roll = 0;
    int oldest_ind = 0;
    
    for (int j = 0; j < n ; j++) {
      double mat_ij = mat(i, j); 
      window(j) = mat_ij;
      roll += mat_ij;
    }
    
    out(i, n - 1) = roll;

    for (int j = n; j < ncol; j ++) {
      double mat_ij = mat(i, j); 
      
      roll += mat_ij;
      roll -= window(oldest_ind);
      
      out(i, j) = roll;
      
      window(oldest_ind) = mat_ij;
      
      if (oldest_ind == n-1) oldest_ind = 0; else oldest_ind++;
    }
  }
  return(out);
}
")

这比转置结果的内存效率高出约10倍 apply(A, 1L, roll::roll_sum, 3L) 对于样本数据集,速度大约快50倍。

bench::mark(rcpp_column_roll(A, 3),
            t(apply(A, 1, roll::roll_sum, 3)))

## # A tibble: 2 x 13
##   expression                             min   median `itr/sec` mem_alloc
##   <bch:expr>                        <bch:tm> <bch:tm>     <dbl> <bch:byt>
## 1 rcpp_column_roll(A, 3)             134.4us  139.7us     6641.    80.7KB
## 2 t(apply(A, 1, roll::roll_sum, 3))   7.62ms   8.91ms      101.     773KB

## With an 80 MB dataset (`rnorm(1E7)`):

##   expression                          min median `itr/sec` mem_alloc
##   <bch:expr>                        <bch> <bch:>     <dbl> <bch:byt>
## 1 rcpp_column_roll(A, 3)            226ms  229ms      4.17    76.3MB
## 2 t(apply(A, 1, roll::roll_sum, 3)) 740ms  740ms      1.35   498.5MB

## 800 MB dataset (`rnorm(1E8)`):

## # A tibble: 2 x 13
##   expression                          min median `itr/sec` mem_alloc
##   <bch:expr>                        <bch> <bch:>     <dbl> <bch:byt>
## 1 rcpp_column_roll(A, 3)            3.49s  3.49s     0.286  762.94MB
## 2 t(apply(A, 1, roll::roll_sum, 3)) 9.62s  9.62s     0.104    4.84GB

内存节省似乎稳定在减少5倍左右,或多或少是结果矩阵本身的分配。

或者,我们可以更接近R,并使用R循环来制作手册 apply 这不需要转置。

out = matrix(NA_real_, nrow(A), ncol(A))
for (i in seq_len(nrow(A))) {
  out[i, ] = roll::roll_sum(A[i, ], 3L)
}

是比换位规则好一点 应用 @Moody_Muskipper的方法最快,尽管 rcpp 是内存效率最高的。

##rnorm(1e8); ncols = 1000;
# A tibble: 6 x 13
  expression               min median `itr/sec` mem_alloc `gc/sec` n_itr
  <bch:expr>             <bch> <bch:>     <dbl> <bch:byt>    <dbl> <int>
1 rcpp_column_roll(A, 3) 3.32s  3.32s     0.301  762.94MB    0         1
2 for_loop               6.12s  6.12s     0.163    2.98GB    0.327     1
3 dww_sappy                 7s     7s     0.143    4.86GB    0.572     1
4 matStat_Moody          1.81s  1.81s     0.552    2.24GB    0.552     1
5 roll_sum_Ronak         8.34s  8.34s     0.120    4.84GB    0.360     1
6 froll_Oliver           7.75s  7.75s     0.129    4.86GB    0.516     1

请注意,如果你真的内存不足,你可以更改Rcpp函数直接修改输入,这意味着你不必分配另一个矩阵。否则,你最好在Rcpp中实现穆迪的聪明解决方案,因为它会更快,只需要分配输出矩阵。

moodymudskipper 4 年前

由于滚动求和可以被视为求和的减法,我们可以使用该包 {MatrixStats} 这能快速完成这些繁琐的任务。

A <- matrix(1:25,5)
A
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    6   11   16   21
#> [2,]    2    7   12   17   22
#> [3,]    3    8   13   18   23
#> [4,]    4    9   14   19   24
#> [5,]    5   10   15   20   25

由于转置成本高昂,你无法做到的事情:

library(roll)
t(roll_sum(t(A),3))
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]   NA   NA   18   33   48
#> [2,]   NA   NA   21   36   51
#> [3,]   NA   NA   24   39   54
#> [4,]   NA   NA   27   42   57
#> [5,]   NA   NA   30   45   60

具有 {MatrixStats}

library(matrixStats)
#> Warning: le package 'matrixStats' a Ã©tÃ© compilÃ© avec la version R 4.0.3
row_roll_sum <- function(x, width) {
out <- rowCumsums(x)
out[,seq(width+1,ncol(out))] <- out[,seq(width+1,ncol(out))] -  out[,seq(ncol(out)-width)]
out[,seq(width-1)] <- NA
out
}
row_roll_sum(A, 3)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]   NA   NA   18   33   48
#> [2,]   NA   NA   21   36   51
#> [3,]   NA   NA   24   39   54
#> [4,]   NA   NA   27   42   57
#> [5,]   NA   NA   30   45   60

dww Jarretinha 4 年前

使用基R矩阵索引,我们可以

n = 3
sapply(seq_len(NCOL(A)-n+1), function(j) rowSums(A[, j:(j+n-1)]))

无需转置,以及 rowSums 应该对速度进行非常优化。

Dewey Brooke 4 年前

按列或行滚动求和

`Rcpp` 按列或行滚动求和的函数

由于能够按行或列进行此操作非常有用,我包括 margin 与中所见用法相同的论点 base::apply (即1=行,2=列)。

#include <Rcpp.h>
using namespace Rcpp;
using namespace std;

// [[Rcpp::export]]
Rcpp::NumericMatrix matrix_rollsum(SEXP x, int n, int margin) {
  Rcpp::NumericMatrix y(x);
  int NR = y.nrow();
  int NC = y.ncol();
  NumericMatrix result(NR,NC);
  std::fill( result.begin(), result.end(), NumericVector::get_na() ) ;

  if(margin==1){
    for(int i = 0; i < NR; ++i){
      NumericVector tmpvec = y(i,_);
      for(int j = 0; j < NC-n+1;++j){
        double s=0.0;
        for(int q=j; q<j+n;q++){
          s+=tmpvec[q];
        }
        result(i,j+n-1) = s;
        s = 0.0;
      }}}

  if(margin==2){

    for(int i = 0; i < NC; ++i){
      NumericVector tmpvec = y(i,_);
      for(int j = 0; j < NR-n+1;++j){
        double s=0.0;
        for(int q=j; q<j+n;q++){
          s+=tmpvec[q];
        }
        result(j+n-1,i) = s;
        s = 0.0;
      }}}

  return result;
}

基准

mat_lg <- matrix(runif(1e6,1,1000),1e3,1e3)
res1 <- microbenchmark::microbenchmark(
  matrix_rollsum = matrix_rollsum(mat_lg, 3,1),
  rcpp_colum_roll = rcpp_column_roll(mat_lg,3), 
  apply_rollsum = apply_rollsum(mat_lg,3),
  for_loop = for_loop(mat_lg,3),
  row_roll_sum = row_roll_sum(mat_lg,width = 3),
  times = 1000
)

knitr::kable(summary(res1))

expr	min	lq	意思是	中值的	uq	最大值	奈瓦尔	cld
matrix_rollsum	9.128677	10.38814	15.78466	13.43251	17.54006	71.10719	1000	一
rcpp_column_roll	23.195918	26.54276	33.65227	30.43353	38.11125	113.20687	1000	b
apply_rollsum	58.027111	72.66437	87.12061	80.50741	94.53146	255.69353	1000	c
for_loop	56.408078	71.78122	85.21565	79.10471	89.47916	269.55304	1000	c
row_roll_sum	8.309067	10.40819	15.62686	12.93160	17.21942	81.76514	1000	一

内存分配基准

res2 <- bench::mark(
  matrix_rollsum = matrix_rollsum(mat_lg, 3,1),
  rcpp_colum_roll = rcpp_column_roll(mat_lg,3), 
  apply_rollsum = apply_rollsum(mat_lg,3),
  for_loop = for_loop(mat_lg,3),
  row_roll_sum = row_roll_sum(mat_lg,width = 3),
  iterations = 1000
)

summary(res2)[,1:9]

# A tibble: 5 x 6
  expression           min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>      <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 matrix_rollsum    9.11ms   11.1ms      79.7   15.31MB    29.0 
2 rcpp_colum_roll   23.2ms   28.6ms      32.2    7.63MB     3.74
3 apply_rollsum    53.94ms   67.1ms      13.7   52.18MB   188.  
4 for_loop         55.18ms     69ms      13.2   33.13MB    17.8 
5 row_roll_sum      8.28ms   10.5ms      78.3   22.87MB    51.5

基准图

p1 <- ggplot2::autoplot(res1)
p2 <- ggplot2::autoplot(res2)

library(patchwork)
p1/p2

编辑

科尔提出了一个很好的观点。为什么要复制大矩阵?处理原始对象不会占用更少的内存吗?所以我重写了 Rcpp 函数使用原始对象。

#include <Rcpp.h>
using namespace Rcpp;
using namespace std;

// [[Rcpp::export]]
Rcpp::NumericMatrix test(NumericMatrix x, int n, int margin) {

  Rcpp::NumericMatrix result(x.nrow(),x.ncol());
  std::fill( result.begin(), result.end(), NumericVector::get_na() ) ;
  double s=0.0;

  if(margin==1){
    for(int i = 0; i < x.nrow(); ++i){
      for(int j = 0; j < x.ncol()-n+1;++j){
        for(int q=j; q<j+n;q++){
          s+=x(i,q);
        }
        result(i,j+n-1) = s;
        s = 0.0;
      }}}

  if(margin==2){

    for(int i = 0; i < x.ncol(); ++i){
      for(int j = 0; j < x.nrow()-n+1;++j){
        for(int q=j; q<j+n;q++){
          s+=x(i,q);
        }
        result(j+n-1,i) = s;
        s = 0.0;
      }}}

  return result;
}

基准

正如科尔所怀疑的那样,新函数分配了原始函数一半的内存,但令人惊讶的是,它慢了3倍。

expr	min	lq	意思是	中值的	uq	最大值	奈瓦尔	cld
matrix_rollsum	9.317332	10.84904	15.47414	13.75330	16.36336	101.6147	1000	一
测试	34.498511	40.08057	47.49839	43.26564	48.34093	211.3246	1000	b

# A tibble: 2 x 6
  expression          min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 matrix_rollsum   9.15ms   10.1ms      93.7   15.31MB    33.4 
2 test             34.1ms   35.4ms      27.5    7.63MB     3.93

Ronak Shah 4 年前

也许,你可以尝试使用 apply 按矩阵行排列:

apply(A, 1, zoo::rollsumr, 3, fill = NA)
#Or
#apply(A, 1, roll::roll_sum, 3)

但是,请注意,这将以列顺序格式为您提供输出。例如,

A <- matrix(1:10, ncol = 5)
apply(A, 1, zoo::rollsumr, 3, fill = NA)

#     [,1] [,2]
#[1,]   NA   NA
#[2,]   NA   NA
#[3,]    9   12
#[4,]   15   18
#[5,]   21   24

Oliver 4 年前

这里提供的两个答案都同样好。对于你是在寻找列或行的滚动求和,还是你的输出应该通过设计进行转置,这个问题似乎有点混乱。如果你在寻找后者,我建议你查看科尔的答案,并反转输出矩阵的维度和索引。

也就是说,如果你要找的是列式操作和输出,你可以简单地使用 froll* 功能来自 data.table 其设计旨在提高速度和内存效率。

mat <- matrix(rnorm(1e8), ncol = 10))
frollsum = frollsum(mat, 3)

我相信 roll 然而,该库的性能有些相似。

ThomasIsCoding 4 年前

这是一个基本的R选项,使用 embed 用于滚动求和

out <- NA * A
out[, -(1:2)] <- t(sapply(1:nrow(A), function(k) rowSums(embed(A[k, ], 3))))

或

out <- NA * A
u <- embed(t(A), 3)
out[, -(1:2)] <- sapply(rev(split(1:ncol(u), ceiling(seq(ncol(u)) / nrow(A)))), function(k) colSums(u[, k]))

如何在R中对列进行滚动求和?

按列或行滚动求和

Rcpp 按列或行滚动求和的函数

基准

内存分配基准

基准图

编辑

基准

`Rcpp` 按列或行滚动求和的函数