代码之家 › 专栏 › 技术社区 › chitown88

计算大熊猫一周比一周的变化(带groupby)?

time-series pandas python

chitown88 · 技术社区 · 6 年前

我已经能够用我的数据每周成功地计算出变化。但是,我的数据包括数千个需要排序的组。因此,我正在寻找一种比我目前的实现方法更快/更有效的方法来计算每周的变化。

它目前的运行方式是,我有一个for循环,它对每个子集/存储ID进行一周到一周的更改。计算效果很好,但要为10000多个不同的项执行此操作需要相当长的时间。有没有一种方法可以通过对“store_id”列进行分组来做到这一点?我一直在玩 .groupby …但不太确定如何使用它,因为它是一个GroupBy对象。

以下是我的代码及其工作原理:

我有一个叫做 df 我所有的信息。它已经被清理和排序,所以每个商店的ID都按周递增。为了保持简单的想法,我们假设我只有这些列:

df[['store_ID', 'Week', 'Sales']]

所以…

# Create list of each store
list_of_stores = list(df['store_ID'].unique())

# Create dataframe to dump the results into
results_df = pd.DataFrame()

# Iterate store-by-store to calculate the week to week values
for store in list_of_stores:

    # Create a temporary dataframe to do the calculation for the store_ID
    temp_df = pd.DataFrame()
    temp_df = df[df['store_ID'] == store]
    index_list = list(temp_df.index)
    temp_df.index = temp_df['Week']
    temp_df['Sales_change_1_week']= temp_df['Sales'] - 
    temp_df['Sales'].shift(1, freq=Week())
    temp_df.index = index_list

    # Dump the temporary dataframe into a results dataframe
    results_df = results_df.append(temp_df)

所以在结束时,我已经完成了每周所有商店ID的结果。我必须要注意,有一些缺少的星期,所以在这种情况下,我确实有几个星期的空值,不能计算上周的变化,我可以接受。

所以我拿着每个商店的ID:

创建一个临时数据框,该数据框按“周”排序。
我存储原始索引
然后按周重新索引(以便它可以按周进行转换)。
每周计算销售周的变化并放入新列
重新索引到原始索引
将其附加到结果数据框中
用下一个商店的ID重复

我觉得有一种方法可以一次完成这一切,而不是单独处理每个商店的ID,但似乎找不到方法。

1 回复 | 直到 6 年前

Yuca 6 年前

这是我使用的类似代码:

week_freq = 'W-TUE'
temp_df['Sales_change_1_week] = temp_df['Sales'].asfreq(week_freq).diff()

推荐文章

W. Walter · 熊猫-根据混合频率数据计算月平均值

2 年前

luide · 熊猫前瞻性滚动窗口-参差不齐指数

6 年前

user5458635 · 多输出LSTM时间序列预测

6 年前

Rafael Díaz · 创建具有特定日期的时间序列

6 年前

Manthan mahes wari · 如何将2D pandas阵列适配到Keras LSTM层?

6 年前

Rob · 时间序列图中的重复模式

6 年前

user60856839 · 使用Sparkyr完成时间序列

6 年前

amigo · 将带有时间序列的大型混合CSV导入R

6 年前

ct957 · R中的colsum条件?

6 年前

jamesrogers93 · 用于检索时间序列财务数据摘要的高效Cassandra DB设计

6 年前