代码之家  ›  专栏  ›  技术社区  ›  Stacey

对来自不同数据帧的数据帧列求和

  •  0
  • Stacey  · 技术社区  · 5 年前

    数据帧的示例如下。示例df 1:

             date BBG.XASX.ABP.S_price BBG.XASX.ABP.S_pos BBG.XASX.ABP.S_trade \ 
    0  2017-09-11            2.8303586                0.0                  0.0   
    1  2017-09-12            2.8135189                0.0                  0.0   
    2  2017-09-13            2.7829274            86614.0              86614.0   
    3  2017-09-14            2.7928042            86614.0                  0.0   
    4  2017-09-15            2.8120383            86614.0                  0.0   
    
      BBG.XASX.ABP.S_cost BBG.XASX.ABP.S_pnl_pre_cost 
    0                -0.0                         0.0   
    1                -0.0                         0.0    
    2    -32.540463966186                         0.0   
    3                -0.0           855.4691551999713             
    4                -0.0           1665.942337400047  
    

            date BBG.XASX.AHG.S_price BBG.XASX.AHG.S_pos BBG.XASX.AHG.S_trade  \
    0  2017-09-11            2.6068676                0.0                  0.0   
    1  2017-09-12            2.6044785            76439.0              76439.0   
    2  2017-09-13   2.6024171000000003            76439.0                  0.0   
    3  2017-09-14            2.6139929            76439.0                  0.0   
    4  2017-09-15            2.6602836            76439.0                  0.0   
    
       BBG.XASX.AHG.S_cost BBG.XASX.AHG.S_pnl_pre_cost 
    0                 -0.0                         0.0   
    1  -26.876303828302497                         0.0   
    2                 -0.0          -157.5713545999606   
    3                 -0.0           884.8425761999679   
    4                 -0.0           3538.414817300014  
    

    示例df 3:

      date BBG.XASX.AGL.S_price BBG.XASX.AGL.S_pos BBG.XASX.AGL.S_trade  \
    0  2017-09-18           18.8195983                0.0                  0.0   
    1  2017-09-19           18.5104704            40613.0              40613.0   
    2  2017-09-20           18.2010515            40613.0                  0.0   
    3  2017-09-21           18.2217768            40613.0                  0.0   
    4  2017-09-22            17.840112            40613.0                  0.0   
    
      BBG.XASX.AGL.S_cost BBG.XASX.AGL.S_pnl_pre_cost 
    0                -0.0                         0.0                          
    1   -101.488374137952                         0.0    
    2                -0.0          -12566.42978570005   
    3                -0.0           841.7166089001112    
    4                -0.0         -15500.552522399928
    

    将示例数据帧相加,代码将返回以下输出:

    输出:

    date                 1       2      3              4               5               6
    11/09/2017   5.4372262       0      0              0               0               0
    12/09/2017   5.4179974   76439  76439              2    -26.87630383               0
    13/09/2017   5.3853445  163053  86614              4    -32.54046397    -157.5713546
    14/09/2017   5.4067971  163053      0              6               0     1740.311731
    15/09/2017   5.4723219  163053      0              8               0     5204.357155
    18/09/2017  18.8195983       0      0              0               0               0
    19/09/2017  18.5104704   40613  40613   -101.4883741               0               0
    20/09/2017  18.2010515   40613      0              0    -12566.42979               0
    21/09/2017  18.2217768   40613      0              0     841.7166089               0
    22/09/2017   17.840112   40613      0              0    -15500.55252               0
    

    所有数据帧具有相同顺序的相同列数。请注意,在输出中,各个df中的日期可能不同,我希望看到各个日期的总数。

    我正在生成的代码 df 数据帧是:

    for subdirname in glob.iglob('C:/Users/stacey/WorkDocs/tradeopt/'+filename+'//BBG*/tradeopt.is-pnl*.lzma', recursive=True):
        df = pd.DataFrame(numpy.zeros((0,27)))
    
        out = []
        with lzma.open(subdirname, mode='rt') as file:
            print(subdirname)
            for line in file:
                items = line.split(",")
                out.append(items)
                if len(out) > 0:
                    a = pd.DataFrame(out[1:], columns=out[0])    
    

    如何将单个df添加到sumdf中?

    0 回复  |  直到 4 年前
        1
  •  2
  •   jezrael    5 年前

    想法是转换列 date DatetimeIndex split 列名称依据 . MultiIndex :

    dfs = [] 
    for subdirname in glob.iglob('C:/Users/stacey/WorkDocs/tradeopt/'+filename+'//BBG*/tradeopt.is-pnl*.lzma', recursive=True): 
        out = []
        with lzma.open(subdirname, mode='rt') as file:
            print(subdirname)
            for line in file:
                items = line.strip().split(",")
                out.append(items)
        if len(out) > 0:
            a = pd.DataFrame(out[1:], columns=out[0]).set_index('date')
            a.index = pd.to_datetime(a.index)  
            dfs.append(a)
    

    然后使用 concat sum 按列名称:

    df = pd.concat(dfs, axis=1).sum(level=0, axis=1)