df = pd.DataFrame(
{'a': ['one', 'one', 'one', 'one', 'two', 'two', 'two', 'three', 'four'],
'b': ['x', 'y','x', 'y', 'x', 'y', 'x', 'x', 'x'],
'c': np.random.randn(9)}
)
df['sum_c_3'] = 99.99
>>> df
a b c sum_c_3
0 one x 1.296379 99.99
1 one y 0.201266 99.99
2 one x 0.953963 99.99
3 one y 0.322922 99.99
4 two x 0.887728 99.99
5 two y -0.154389 99.99
6 two x -2.390790 99.99
7 three x -1.218706 99.99
8 four x -0.043964 99.99
现在我需要做很多操作,所以举个例子,我将计算下3条记录的总和,将结果保存到新列中,如下所示:
for w in ['one','two','three','four']:
x = df.loc[df['a']==w]
size = x.iloc[:]['a'].count()
print("Records %s: %s" %(w,size))
target_column = x.columns.get_loc('c')
for i in range(0,size):
idx = x.index
acum = x.iloc[i:i+3,target_column].sum()
x.loc[x.loc[idx,'sum_c_3'].index[i],'sum_c_3'] = acum
print (x)
输出:
Records one: 4
a b c sum_c_3
0 one x 1.296379 2.451607
1 one y 0.201266 1.478151
2 one x 0.953963 1.276885
3 one y 0.322922 0.322922
Records two: 3
a b c sum_c_3
4 two x 0.887728 -1.657452
5 two y -0.154389 -2.545180
6 two x -2.390790 -2.390790
Records three: 1
a b c sum_c_3
7 three x -1.218706 -1.218706
Records four: 1
a b c sum_c_3
8 four x -0.043964 -0.043964
最后,我的疑问是:如何更新原始数据帧?
我能自动切片保存总数吗?或者应该使用序列(切片)按索引进行更新?
原始版本保持不变,无任何更新,请参见此处:
>>> df
a b c sum_c_3
0 one x 1.296379 99.99
1 one y 0.201266 99.99
2 one x 0.953963 99.99
3 one y 0.322922 99.99
4 two x 0.887728 99.99
5 two y -0.154389 99.99
6 two x -2.390790 99.99
7 three x -1.218706 99.99
8 four x -0.043964 99.99
>>>