代码之家  ›  专栏  ›  技术社区  ›  Mohamed Thasin ah

重塑数据帧失败

  •  2
  • Mohamed Thasin ah  · 技术社区  · 6 年前

    我想重塑我的数据帧,它只有键值对。

    例如,

                 key                                              value
    0     Message-ID       <5525962.1075855679785.JavaMail.evans@thyme>
    1           Date              Wed, 13 Dec 2000 07:04:00 -0800 (PST)
    2           From                            phillip.allen@enron.com
    3             To  christi.nicolay@enron.com, james.steffes@enron...
    4         X-From                                    Phillip K Allen
    5           X-To  Christi L Nicolay, James D Steffes, Jeff Dasov...
    6          X-cc:                                               None
    7         X-bcc:                                               None
    8       X-Origin                                            Allen-P
    9     Message-ID       <4650921.1075855679981.JavaMail.evans@thyme>
    10          Date               Tue, 5 Dec 2000 07:31:00 -0800 (PST)
    11          From                               ina.rangel@enron.com
    12            To                             amanda.huble@enron.com
    13        X-From                                         Ina Rangel
    14          X-To                                       Amanda Huble
    15         X-cc:                                               None
    16        X-bcc:                                               None
    17      X-Origin                                            Allen-P
    

    我想把它变成:

    Message-ID       Date                  From             To        X-From                 X-To                            X-cc:  X-bcc:  X-Origin
    <5525962.10...   Wed, 13 Dec 2000...   phillip.allen... christi.nicolay.. Phillip K Allen..     Christi L Nicolay, Ja... NaN    NaN     Allen-P
    <4650921.10...   Tue, 5 Dec 2000 ...   ina.rangel...    amanda.huble@...  Ina Rangel            Amanda Huble             NaN    NaN     Allen-P
    

    如果你找到了,可以随意标记为复制品。

    1 回复  |  直到 6 年前
        1
  •  3
  •   jezrael    6 年前

    如果每组都有9个值,则可以使用 numpy.reshape 对于 2d array DataFrame constructor,也为列值取列的前9个值 key :

    print (df['value'].values.reshape(-1, 9))
    [['<5525962.1075855679785.JavaMail.evans@thyme>'
      'Wed, 13 Dec 2000 07:04:00 -0800 (PST)' 'phillip.allen@enron.com'
      'christi.nicolay@enron.com, james.steffes@enron...' 'Phillip K Allen'
      'Christi L Nicolay, James D Steffes, Jeff Dasov...' 'None' 'None'
      'Allen-P']
     ['<4650921.1075855679981.JavaMail.evans@thyme>'
      'Tue, 5 Dec 2000 07:31:00 -0800 (PST)' 'ina.rangel@enron.com'
      'amanda.huble@enron.com' 'Ina Rangel' 'Amanda Huble' 'None' 'None'
      'Allen-P']]
    
    
    df = pd.DataFrame(df['value'].values.reshape(-1, 9), columns=df['key'].iloc[:9])
    print (df)
    key                                    Message-ID  \
    0    <5525962.1075855679785.JavaMail.evans@thyme>   
    1    <4650921.1075855679981.JavaMail.evans@thyme>   
    
    key                                   Date                     From  \
    0    Wed, 13 Dec 2000 07:04:00 -0800 (PST)  phillip.allen@enron.com   
    1     Tue, 5 Dec 2000 07:31:00 -0800 (PST)     ina.rangel@enron.com   
    
    key                                                 To           X-From  \
    0    christi.nicolay@enron.com, james.steffes@enron...  Phillip K Allen   
    1                               amanda.huble@enron.com       Ina Rangel   
    
    key                                               X-To X-cc: X-bcc: X-Origin  
    0    Christi L Nicolay, James D Steffes, Jeff Dasov...  None   None  Allen-P  
    1                                         Amanda Huble  None   None  Allen-P 
    

    如果总是这样 Message-ID 每个组的数据行都可以使用 set_index 带助手 Series 创建人 cumsum 布尔掩码-比较依据 eq ==

    df = df.set_index([df['key'].eq('Message-ID').cumsum(), 'key'])['value'].unstack()
    print (df)
    key                                   Date                     From  \
    key                                                                   
    1    Wed, 13 Dec 2000 07:04:00 -0800 (PST)  phillip.allen@enron.com   
    2     Tue, 5 Dec 2000 07:31:00 -0800 (PST)     ina.rangel@enron.com   
    
    key                                    Message-ID  \
    key                                                 
    1    <5525962.1075855679785.JavaMail.evans@thyme>   
    2    <4650921.1075855679981.JavaMail.evans@thyme>   
    
    key                                                 To           X-From  \
    key                                                                       
    1    christi.nicolay@enron.com, james.steffes@enron...  Phillip K Allen   
    2                               amanda.huble@enron.com       Ina Rangel   
    
    key X-Origin                                               X-To X-bcc: X-cc:  
    key                                                                           
    1    Allen-P  Christi L Nicolay, James D Steffes, Jeff Dasov...   None  None  
    2    Allen-P                                       Amanda Huble   None  None