代码之家  ›  专栏  ›  技术社区  ›  swifty

熊猫-从听写列表中创建df

  •  1
  • swifty  · 技术社区  · 6 年前

    我有以下格式的数据(每个包含3个列表的听写列表):

    [{40258: [['2018-07-03T14:13:41'], ['Open'], ['Closed']]},
     {40257: [['2018-07-03T13:47:55',
        '2018-07-03T14:21:52',
        '2018-07-04T11:56:44'],
       ['Open', 'In Progress', 'Waiting on 3rd Party'],
       ['In Progress', 'Waiting on 3rd Party', 'In Progress']]},
     {40255: [['2018-07-03T13:12:58'], ['Open'], ['Closed']]},
     {40250: [[], [], []]}]
    

    我希望将上述内容转换为以下df:

    key    List1-1              List1-2            List1-3               List2-1     List2-2          List2-3                 List3-1         List3-2                   List3-3
    40258  2018-07-03T14:13:41  nan                nan                   'Open'      nan              nan                     'Closed'        nan                       nan
    40257  2018-07-03T13:47:55 2018-07-03T14:21:52 2018-07-04T11:56:44   'Open'     'In Progress'    'Waiting on 3rd Party'   'In Progress'   'Waiting on 3rd Party'   'In Progress'
    40255  2018-07-03T13:12:58  nan                nan                   'Open'      nan              nan                     'Closed'        nan                       nan
    40250  nan                  nan                nan                    nan        nan              nan                      nan            nan                       nan
    
    • 每个键都是一行,列表中的每个元素都是一列。
    • 外部列表包含50000个要制作成行的dict。
    • 总有3个内部列表。
    • 内部列表的长度可变-从0到最多25。

    我试过平原 pd.DataFrame pd.DataFrame.from_dict 但是我找不到解决方案来处理dict中的多个列表。

    任何帮助都非常感谢。

    3 回复  |  直到 6 年前
        1
  •  3
  •   Sunitha    6 年前
    data=[{40258: [['2018-07-03T14:13:41'], ['Open'], ['Closed']]},
     {40257: [['2018-07-03T13:47:55',
         '2018-07-03T14:21:52',
         '2018-07-04T11:56:44'],
        ['Open', 'In Progress', 'Waiting on 3rd Party'],
        ['In Progress', 'Waiting on 3rd Party', 'In Progress']]},
      {40255: [['2018-07-03T13:12:58'], ['Open'], ['Closed']]},
      {40250: [[], [], []]}]
    
    f = lambda x: x + [np.nan]*(3-len(x))
    mod_data = [ [k]+ sum(list(map(f, v)), []) for d in data for k,v in d.items()]
    
    cols = ['key', 'List1-1', 'List1-2', 'List1-3', 'List2-1', 'List2-2', 'List2-3', 'List3-1', 'List3-2', 'List3-3']
    df = pd.DataFrame(mod_data, columns=cols).set_index('key')
    print(df)
    

                       List1-1              List1-2              List1-3 List2-1      List2-2               List2-3      List3-1               List3-2      List3-3
    key                                                                                                                                                            
    40258  2018-07-03T14:13:41                  NaN                  NaN    Open          NaN                   NaN       Closed                   NaN          NaN
    40257  2018-07-03T13:47:55  2018-07-03T14:21:52  2018-07-04T11:56:44    Open  In Progress  Waiting on 3rd Party  In Progress  Waiting on 3rd Party  In Progress
    40255  2018-07-03T13:12:58                  NaN                  NaN    Open          NaN                   NaN       Closed                   NaN          NaN
    40250                  NaN                  NaN                  NaN     NaN          NaN                   NaN          NaN                   NaN          NaN
    
        2
  •  2
  •   Shreyas Pimpalgaonkar    6 年前

    # First calculate the length of maximum list in the dictionary
    # Let that be lmax
    data = []
    for elem in dict :
        for key in elem :  # Note that only one key is there
            lst = elem[key] # z is the list
            data_curr = [np.nan] * (3*len(lmax) + 1)
            data_curr[0] = elem
            l = len(lst[0])
            for i in range(0,l) :
                 data_curr[3*i+1] = z[0][i]
                 data_curr[3*i+2] = z[1][i]
                 data_curr[3*i+3] = z[2][i]
            data.append(data_curr]
    
    columns = ['key','List1-1,List1-2','List1-3','List2-1','List2-2','List2-3','List3-1','List3-2','List3-3']
    df = pd.DataFrame(data,columns=columns)
    

        3
  •  1
  •   cosmic_inquiry    6 年前

    from numpy import nan
    mess = [{40258: [['2018-07-03T14:13:41'], ['Open'], ['Closed']]},
     {40257: [['2018-07-03T13:47:55',
        '2018-07-03T14:21:52',
        '2018-07-04T11:56:44'],
       ['Open', 'In Progress', 'Waiting on 3rd Party'],
       ['In Progress', 'Waiting on 3rd Party', 'In Progress']]},
     {40255: [['2018-07-03T13:12:58'], ['Open'], ['Closed']]},
     {40250: [[], [], []]}]
    
    master = dict()
    for dicto in mess:
        key = list(dicto.keys())[0]
        master[key] = {('List{}-{}'.format(j+1,i+1)): (dicto[key][j][i] if i < len(dicto[key][j]) else nan ) for i in range(3) for j in range(3)}
    output = pd.DataFrame.from_records(master, columns=list(master.keys())).T
    print(output.to_string())
    

                       List1-1              List1-2              List1-3 List2-1      List2-2               List2-3      List3-1               List3-2      List3-3
    40258  2018-07-03T14:13:41                  NaN                  NaN    Open          NaN                   NaN       Closed                   NaN          NaN
    40257  2018-07-03T13:47:55  2018-07-03T14:21:52  2018-07-04T11:56:44    Open  In Progress  Waiting on 3rd Party  In Progress  Waiting on 3rd Party  In Progress
    40255  2018-07-03T13:12:58                  NaN                  NaN    Open          NaN                   NaN       Closed                   NaN          NaN
    40250                  NaN                  NaN                  NaN     NaN          NaN                   NaN          NaN                   NaN          NaN