代码之家 › 专栏 › 技术社区 › blackraven ril3y

将“字典中的字典”列表转换为数据帧

dictionary dataframe list pandas python

blackraven ril3y · 技术社区 · 2 年前

我有一个“字典中的字典”列表,如下所示:

lis = [{'Health and Welfare Plan + Change Notification': {'evidence_capture': 'null',
   'test_result_justification': 'null',
   'latest_test_result_date': 'null',
   'last_updated_by': 'null',
   'test_execution_status': 'Not Started',
   'test_result': 'null'}},
 {'Health and Welfare Plan + Computations': {'evidence_capture': 'null',
   'test_result_justification': 'null',
   'latest_test_result_date': 'null',
   'last_updated_by': 'null',
   'test_execution_status': 'Not Started',
   'test_result': 'null'}},
 {'Health and Welfare Plan + Data Agreements': {'evidence_capture': 'null',
   'test_result_justification': 'Due to the Policy',
   'latest_test_result_date': '2019-10-02',
   'last_updated_by': 'null',
   'test_execution_status': 'In Progress',
   'test_result': 'null'}},
 {'Health and Welfare Plan + Data Elements': {'evidence_capture': 'null',
   'test_result_justification': 'xxx',
   'latest_test_result_date': '2019-10-02',
   'last_updated_by': 'null',
   'test_execution_status': 'In Progress',
   'test_result': 'null'}},
 {'Health and Welfare Plan + Data Quality Monitoring': {'evidence_capture': 'null',
   'test_result_justification': 'xxx',
   'latest_test_result_date': '2019-08-09',
   'last_updated_by': 'null',
   'test_execution_status': 'Completed',
   'test_result': 'xxx'}},
 {'Health and Welfare Plan + HPU Source Reliability': {'evidence_capture': 'null',
   'test_result_justification': 'xxx.',
   'latest_test_result_date': '2019-10-02',
   'last_updated_by': 'null',
   'test_execution_status': 'In Progress',
   'test_result': 'null'}},
 {'Health and Welfare Plan + Lineage': {'evidence_capture': 'null',
   'test_result_justification': 'null',
   'latest_test_result_date': 'null',
   'last_updated_by': 'null',
   'test_execution_status': 'Not Started',
   'test_result': 'null'}},
 {'Health and Welfare Plan + Metadata': {'evidence_capture': 'null',
   'test_result_justification': 'Valid',
   'latest_test_result_date': '2020-07-02',
   'last_updated_by': 'null',
   'test_execution_status': 'Completed',
   'test_result': 'xxx'}},
 {'Health and Welfare Plan + Usage Reconciliation': {'evidence_capture': 'null',
   'test_result_justification': 'Test out of scope',
   'latest_test_result_date': '2019-10-02',
   'last_updated_by': 'null',
   'test_execution_status': 'In Progress',
   'test_result': 'null'}}]

我想将列表转换为如下所示的数据帧:

                        evidence_capture last_updated_by latest_test_result_date test_execution_status test_result test_result_justification            test_category
Change Notification                 null            null                    null           Not Started        null                      null  Health and Welfare Plan
Computations                        null            null                    null           Not Started        null                      null  Health and Welfare Plan
Data Agreements                     null            null              2019-10-02           In Progress        null         Due to the Policy  Health and Welfare Plan
Data Elements                       null            null              2019-10-02           In Progress        null                       xxx  Health and Welfare Plan
Data Quality Monitoring             null            null              2019-08-09             Completed         xxx                       xxx  Health and Welfare Plan
HPU Source Reliability              null            null              2019-10-02           In Progress        null                      xxx.  Health and Welfare Plan
Lineage                             null            null                    null           Not Started        null                      null  Health and Welfare Plan
Metadata                            null            null              2020-07-02             Completed         xxx                     Valid  Health and Welfare Plan
Usage Reconciliation                null            null              2019-10-02           In Progress        null         Test out of scope  Health and Welfare Plan

我构建数据帧的代码是使用for循环逐列连接记录。之后处理列名,然后将其转置。最终输出将从每个行索引中删除重复的字符串“Health and Welfare Plan”,但将其附加为新列。

df3 = pd.DataFrame(lis[0])
for i in range(1, len(lis)):
    df3 = pd.concat([df3, pd.DataFrame(lis[i])], axis=1)
df3.columns = [col.split(' + ')[1] for col in df3.columns]
df3 = df3.T
df3['test_category'] = 'Health and Welfare Plan'
print(df3)

该代码能够产生最终输出,但同时使用for循环和数据帧concat的“昂贵”函数。所以我想知道是否有更好的方法来输出相同的结果?

0 回复 | 直到 2 年前

Shubham Sharma mkln 2 年前

让我们做dictcomp来压平字典列表

pd.DataFrame({k.split(' + ')[1]: v for d in lis for k, v in d.items()}).T

                        evidence_capture test_result_justification latest_test_result_date last_updated_by test_execution_status test_result
Change Notification                 null                      null                    null            null           Not Started        null
Computations                        null                      null                    null            null           Not Started        null
Data Agreements                     null         Due to the Policy              2019-10-02            null           In Progress        null
Data Elements                       null                       xxx              2019-10-02            null           In Progress        null
Data Quality Monitoring             null                       xxx              2019-08-09            null             Completed         xxx
HPU Source Reliability              null                      xxx.              2019-10-02            null           In Progress        null
Lineage                             null                      null                    null            null           Not Started        null
Metadata                            null                     Valid              2020-07-02            null             Completed         xxx
Usage Reconciliation                null         Test out of scope              2019-10-02            null           In Progress        null

Tranbi 2 年前

您可以使用字典值构建数据帧,并通过“+”拆分手动设置索引:

df = pd.DataFrame([list(e.values())[0] for e in lis])
df.index = map(lambda x: x.split('+')[-1].strip(), [list(e.keys())[0] for e in lis])
print(df)

输出

                        evidence_capture test_result_justification  ... test_execution_status test_result
Change Notification                 null                      null  ...           Not Started        null
Computations                        null                      null  ...           Not Started        null
Data Agreements                     null         Due to the Policy  ...           In Progress        null
Data Elements                       null                       xxx  ...           In Progress        null
Data Quality Monitoring             null                       xxx  ...             Completed         xxx
HPU Source Reliability              null                      xxx.  ...           In Progress        null
Lineage                             null                      null  ...           Not Started        null
Metadata                            null                     Valid  ...             Completed         xxx
Usage Reconciliation                null         Test out of scope  ...           In Progress        null

[9 rows x 6 columns]

编辑:在一个循环中完成所有操作可能会更快:

data = []
ind = []
for e in lis:
    data.append(list(e.values())[0])
    ind.append(list(e.keys())[0].split('+')[1].strip())
df = pd.DataFrame(data, index=ind)

您仍然可以使用在末尾添加新列 df['test_category'] = 'Health and Welfare Plan'

Adrian Ang 2 年前

我修改了您的部分代码以获得相同的结果。使用字典操作应该更快

import pandas as pd

dic = {}
for e in lis:
    dic[list(e.keys())[0]] = list(e.values())[0]
df3 = pd.DataFrame(dic)
df3.columns = [col.split('+')[-1].strip() for col in df3.columns]
df3 = df3.T
df3['test_category'] = 'Health and Welfare Plan'
print(df3)