代码之家  ›  专栏  ›  技术社区  ›  Andre Araujo

如何在子集(切片)计算后更新原始数据帧?

  •  0
  • Andre Araujo  · 技术社区  · 6 年前

    df = pd.DataFrame(
                {'a': ['one', 'one', 'one', 'one', 'two', 'two', 'two', 'three', 'four'],
                'b': ['x', 'y','x', 'y', 'x', 'y', 'x', 'x', 'x'],
                'c': np.random.randn(9)}
             )
    
    df['sum_c_3'] = 99.99
    

    >>> df
           a  b         c  sum_c_3
    0    one  x  1.296379    99.99
    1    one  y  0.201266    99.99
    2    one  x  0.953963    99.99
    3    one  y  0.322922    99.99
    4    two  x  0.887728    99.99
    5    two  y -0.154389    99.99
    6    two  x -2.390790    99.99
    7  three  x -1.218706    99.99
    8   four  x -0.043964    99.99
    

    现在我需要做很多操作,所以举个例子,我将计算下3条记录的总和,将结果保存到新列中,如下所示:

    for w in ['one','two','three','four']:
        x = df.loc[df['a']==w]
        size = x.iloc[:]['a'].count()
        print("Records %s: %s" %(w,size))
        target_column = x.columns.get_loc('c')
        for i in range(0,size):
            idx = x.index
            acum = x.iloc[i:i+3,target_column].sum()
            x.loc[x.loc[idx,'sum_c_3'].index[i],'sum_c_3'] = acum
        print (x) 
    

    输出:

    Records one: 4
         a  b         c   sum_c_3
    0  one  x  1.296379  2.451607
    1  one  y  0.201266  1.478151
    2  one  x  0.953963  1.276885
    3  one  y  0.322922  0.322922
    Records two: 3
         a  b         c   sum_c_3
    4  two  x  0.887728 -1.657452
    5  two  y -0.154389 -2.545180
    6  two  x -2.390790 -2.390790
    Records three: 1
           a  b         c   sum_c_3
    7  three  x -1.218706 -1.218706
    Records four: 1
          a  b         c   sum_c_3
    8  four  x -0.043964 -0.043964
    

    最后,我的疑问是:如何更新原始数据帧?

    我能自动切片保存总数吗?或者应该使用序列(切片)按索引进行更新?

    原始版本保持不变,无任何更新,请参见此处:

    >>> df
           a  b         c  sum_c_3
    0    one  x  1.296379    99.99
    1    one  y  0.201266    99.99
    2    one  x  0.953963    99.99
    3    one  y  0.322922    99.99
    4    two  x  0.887728    99.99
    5    two  y -0.154389    99.99
    6    two  x -2.390790    99.99
    7  three  x -1.218706    99.99
    8   four  x -0.043964    99.99
    >>> 
    
    1 回复  |  直到 5 年前
        1
  •  1
  •   BENY    6 年前

    添加 update 结束时 for loop

    for w in ['one','two','three','four']:
        x = df.loc[df['a']==w]
        size = x.iloc[:]['a'].count()
        print("Records %s: %s" %(w,size))
        target_column = x.columns.get_loc('c')
        for i in range(0,size):
            idx = x.index
            acum = x.iloc[i:i+3,target_column].sum()
            x.loc[x.loc[idx,'sum_c_3'].index[i],'sum_c_3'] = acum
        print (x)
        df.update(x)# here is the one need to add
    
    df
    Out[979]: 
           a  b         c   sum_c_3
    0    one  x  0.127171  0.210872
    1    one  y -0.576157  1.212010
    2    one  x  0.659859  1.788168
    3    one  y  1.128309  1.128309
    4    two  x  0.333521 -0.846657
    5    two  y  0.753613 -1.180178
    6    two  x -1.933791 -1.933791
    7  three  x  0.549009  0.549009
    8   four  x  0.895742  0.895742