我正在尝试创建一个函数,用于对所请求频率的给定数据帧进行规范化。
代码:
import numpy as np
import pandas as pd
def timeseries_dataframe_normalized(df, normalization_freq = 'complete'):
"""
Input:
df : dataframe
input dataframe
normalization_freq : string
'daily', 'weekly', 'monthly','quarterly','yearly','complete' (default)
Return: normalized dataframe
"""
# auxiliary dataframe
adf = df.copy()
# convert columns to float
# Ref: https://stackoverflow.com/questions/15891038/change-column-type-in-pandas
adf = adf.astype(float)
# normalized columns
nor_cols = adf.columns
# add suffix to columns and create new names for maximum columns
max_cols = adf.add_suffix('_max').columns
# initialize maximum columns
adf.loc[:,max_cols] = np.nan
# check the requested frequency
if normalization_freq =='complete':
adf[max_cols] = adf[nor_cols].max()
# compute and return the normalized dataframe
print(adf[nor_cols])
print(adf[max_cols])
adf[nor_cols] = adf[nor_cols]/adf[max_cols]
# return the normalized dataframe
return adf[nor_cols]
# Example
df2 = pd.DataFrame(data={'A':[20,10,30],'B':[1,2,3]})
timeseries_dataframe_normalized(df2)
预期输出:
df2 =
A B
0 0.666667 0.333333
1 0.333333 0.666667
2 1.000000 1.000000
当前输出:
我很惊讶会出现以下错误。然而,当我计算
df2/df2.max()
我得到了预期的输出,但这个函数给了我错误。
ValueError: Columns must be same length as key