代码之家 › 专栏 › 技术社区 › msh855

加快时间序列自举

simulation time-series performance pandas

msh855 · 技术社区 · 1 年前

我想对时间序列数据执行自举,然后对每个不同的样本应用一个函数来计算各种度量。

自举本身,即模拟不同的样本,是足够快的。但问题是当我想将函数应用于每个样本时。通常,我会处理10000多个不同的样本。

以下是MWE:

import yfinance
from arch.bootstrap import IIDBootstrap
import pandas as pd
import quantstats as qs

def BootstrapIDD(series: pd.Series = None, n_samples: int = 1000):
    nobs = len(series)
    bsidd = IIDBootstrap(series)
    ps_list = []
    row_range = range(0, nobs)
    for pos_data in bsidd.bootstrap(reps=n_samples):
        new_sample = pos_data[0][0]
        new_sample.index = row_range
        ps_list.append(new_sample)

    df = pd.concat(ps_list, axis=1)

    # cleaning
    string_name = series.name
    cols = [string_name + '_path_' + str(x) for x in range(1, n_samples + 1)]
    df.columns = cols
    df.index = ret_bench.index

    return df

data = yfinance.download('^GSPC')
ret_bench = data['Adj Close'].pct_change()
ret_bench.name ='SP500'

func = qs.stats.adjusted_sortino
df_results = BootstrapIDD(ret_bench, n_samples=10000)
 
df_func = df_results.apply(func)

慢的部分是这条线: df_func = df_results.apply(func)

有人能帮助提高我在数据帧的所有列中应用此函数的最后一部分的速度吗?

更新

这个 func 在这个例子中,以一个数据帧作为输入,速度非常快。但是,通常情况下,并不是所有我想跨列应用的函数都能做到这一点,所以找到一个替代 df.apply(func)

0 回复 | 直到 1 年前

msh855 1 年前

根据评论,一个效果很快的解决方案是:

from parallel_pandas import ParallelPandas

ParallelPandas.initialize(n_cpu=7, disable_pr_bar=False)

df_results.p_apply(func, raw=False, executor='processes')

推荐文章

W. Walter · 熊猫-根据混合频率数据计算月平均值

2 年前

luide · 熊猫前瞻性滚动窗口-参差不齐指数

6 年前

user5458635 · 多输出LSTM时间序列预测

7 年前

Rafael Díaz · 创建具有特定日期的时间序列

7 年前

Manthan mahes wari · 如何将2D pandas阵列适配到Keras LSTM层?

7 年前

Rob · 时间序列图中的重复模式

7 年前

user60856839 · 使用Sparkyr完成时间序列

7 年前

amigo · 将带有时间序列的大型混合CSV导入R

7 年前

ct957 · R中的colsum条件?

7 年前

jamesrogers93 · 用于检索时间序列财务数据摘要的高效Cassandra DB设计

7 年前