代码之家  ›  专栏  ›  技术社区  ›  BEAst

在python中创建额外级别的头文件(pandas)

  •  -1
  • BEAst  · 技术社区  · 7 年前

    我是一个编程新手,但目前正在使用数据帧。我试图将当前数据帧堆叠到特定的“设计”中。目前我正在处理更大的文件,有很多数据。然而,我无法按照我的意愿堆叠()我的数据,形状完全是一团糟。我需要帮助如何定义多索引,创建更多级别。

    我希望你能帮我,我正在粘贴一个例子 enter image description here

        Exports      NaN      NaN      NaN      Net Exports       NaN      NaN  
    0      Total   Sweden   Norway  Germany        Total   Sweden   Norway    
    1     1032.8      358    239.7    435.1        636.8    274.1      9.7   
    2     1198.8    556.4    211.8    430.6        846.3    522.6     -1.1   `
    

    使用堆栈():

         Exports            Total
         NaN               Sweden
         NaN               Norway
         NaN              Germany
         Net Exports        Total
         NaN               Sweden
         NaN               Norway
         NaN              Germany
         NaN                  GWh
    1    Exports           1032.8
         NaN                  358
         NaN                239.7
         NaN                435.1
         Net Exports        636.8
         NaN                274.1
         NaN                  9.7
         NaN                  353
    

    1 回复  |  直到 4 年前
        1
  •  1
  •   jezrael    7 年前

    我认为您需要:

    print (r.head())
        Unnamed: 18 Unnamed: 19 Unnamed: 20 Unnamed: 21   Unnamed: 22 Unnamed: 23  \
    0       Exports         NaN         NaN         NaN  Net Exports          NaN   
    2         Total      Sweden      Norway     Germany         Total      Sweden   
    189      1032.8         358       239.7       435.1         636.8       274.1   
    190      1198.8       556.4       211.8       430.6         846.3       522.6   
    191       982.7       159.3       166.2       657.2         276.3      -156.8   
    
        Unnamed: 24 Unnamed: 25     Unit:  
    0           NaN         NaN       NaN  
    2        Norway     Germany       GWh  
    189         9.7         353   January  
    190        -1.1       324.8  February  
    191      -105.9         539     March  
    

    #create index from column Unit 
    r = r.set_index('Unit:')
    #create Multiindex from first and second row
    #NaNs in frst row was replace by ffill - forward filling fillna()
    r.columns= pd.MultiIndex.from_arrays([r.iloc[0].ffill(), r.iloc[1]], names=(None, None))
    #remove first and second row
    r = r.iloc[2:]
    
    print (r.head())
             Exports                       Net Exports                       
               Total Sweden Norway Germany        Total Sweden Norway Germany
    Unit:                                                                    
    January   1032.8    358  239.7   435.1        636.8  274.1    9.7     353
    February  1198.8  556.4  211.8   430.6        846.3  522.6   -1.1   324.8
    March      982.7  159.3  166.2   657.2        276.3 -156.8 -105.9     539
    April      962.3   22.1     62   878.2       -268.6 -741.3 -352.9   825.6
    May        951.2   13.5   15.9   921.8       -511.5 -885.2 -496.4   870.1
    
    print (r.stack().head(10))
                     Exports Net Exports 
    Unit:                                
    January  Germany   435.1          353
             Norway    239.7          9.7
             Sweden      358        274.1
             Total    1032.8        636.8
    February Germany   430.6        324.8
             Norway    211.8         -1.1
             Sweden    556.4        522.6
             Total    1198.8        846.3
    March    Germany   657.2          539
             Norway    166.2       -105.9