代码之家  ›  专栏  ›  技术社区  ›  William

读取Pandas数据帧中的嵌套JSON文件

  •  0
  • William  · 技术社区  · 3 年前

    背景: 我正在尝试规范一个json文件,并将其保存到pandas数据帧中,但是我在导航json结构时遇到了问题,并且我的代码无法按预期工作。

    预期的数据帧输出: 给定以下示例 json 文件(使用随机化的数据,但格式与实际数据完全相同),这是我试图产生的输出-

    新建实体组 实体ID 调整后的价值
    (2022年1月31日,无折扣,美元)
    调整后的TWR
    (本季度无分区,美元))
    调整后的TWR
    (本年迄今,无分区,美元)
    年度调整后TWR
    (自《盗梦空间》以来,无潜水,美元)
    起始日期 风险目标
    投资组合_1 260786美元 (44.55%) (44.55%) (44.55%)* 2021年4月7日 不适用
    FW Irrev家族Tr 9552252 260786美元 0.00% 0.00% 0.00%* 2022年1月11日 不适用
    投资组合_2 18396664美元 (5.78%) (5.78%) (5.47%)* 2021年9月3日 发育
    FW-DAF 10946585 18396664美元 (5.78%) (5.78%) (5.47%)* 2021年9月3日 发育
    投资组合_3 60143818美元 (4.42%) (4.42%) 7.75%* 2020年12月17日 -
    FW家族信托 13014080 475356美元 (6.10%) (6.10%) (3.97%)* 2021年4月9日 侵略性的
    FW流动基金LP 13396796 52899527美元 (4.15%) (4.15%) (4.15%)* 2021年12月30日 侵略性的
    FW控股第二有限责任公司 8413655 6768937美元 (0.77%) (0.77%) 11.84%* 2021年3月5日 不适用
    FW和FR接头 9957007 (1美元) - - -* 2021年12月21日 不适用

    实际数据帧输出: 尽管我尽了最大的努力,我还是只能得到粗体的行来映射到数据帧中:

    新建实体组 实体ID 调整后的价值
    (2022年1月31日,无折扣,美元)
    调整后的TWR
    (本季度无分区,美元))
    调整后的TWR
    (本年迄今,无分区,美元)
    年度调整后TWR
    (自《盗梦空间》以来,无潜水,美元)
    起始日期 风险目标
    投资组合_1 260786美元 (44.55%) (44.55%) (44.55%)* 2021年4月7日 不适用
    投资组合_2 18396664美元 (5.78%) (5.78%) (5.47%)* 2021年9月3日 发育
    投资组合_3 60143818美元 (4.42%) (4.42%) 7.75%* 2020年12月17日 -

    JSON文件: 这是我试图规范化并映射到数据帧中的文件:

    {
        "meta": {
            "columns": [
                {
                    "key": "node_id",
                    "display_name": "Entity ID",
                    "output_type": "Word"
                },
                {
                    "key": "value",
                    "display_name": "Adjusted Value (1/31/2022, No Div, USD)",
                    "output_type": "Number",
                    "currency": "USD"
                },
                {
                    "key": "time_weighted_return",
                    "display_name": "Adjusted TWR (Current Quarter, No Div, USD)",
                    "output_type": "Percent",
                    "currency": "USD"
                },
                {
                    "key": "time_weighted_return_2",
                    "display_name": "Adjusted TWR (YTD, No Div, USD)",
                    "output_type": "Percent",
                    "currency": "USD"
                },
                {
                    "key": "time_weighted_return_3",
                    "display_name": "Annualized Adjusted TWR (Since Inception, No Div, USD)",
                    "output_type": "Percent",
                    "currency": "USD"
                },
                {
                    "key": "inception_event_date",
                    "display_name": "Inception Date",
                    "output_type": "Date"
                },
                {
                    "key": "_custom_portfolio_target_347209",
                    "display_name": "Risk Target",
                    "output_type": "Word"
                }
            ],
            "groupings": [
                {
                    "key": "_custom_new_entity_group_453577",
                    "display_name": "NEW Entity Group"
                },
                {
                    "key": "top_level_legal_entity",
                    "display_name": "Top Level Legal Entity"
                }
            ]
        },
        "data": {
            "type": "portfolio_views",
            "attributes": {
                "total": {
                    "name": "Total",
                    "columns": {
                        "time_weighted_return": -0.05001974888806926,
                        "inception_event_date": "2020-12-17",
                        "_custom_portfolio_target_347209": null,
                        "time_weighted_return_3": 0.0678647066340392,
                        "time_weighted_return_2": -0.05001974888806926,
                        "value": 7.880126780581851E7,
                        "node_id": null
                    },
                    "children": [
                        {
                            "name": "Portfolio_3",
                            "grouping": "_custom_new_entity_group_453577",
                            "columns": {
                                "time_weighted_return": -0.04420061615233983,
                                "inception_event_date": "2020-12-17",
                                "_custom_portfolio_target_347209": null,
                                "time_weighted_return_3": 0.07748325432684622,
                                "time_weighted_return_2": -0.04420061615233983,
                                "value": 6.014381761929752E7,
                                "node_id": null
                            },
                            "children": [
                                {
                                    "entity_id": 9957007,
                                    "name": "FW and FR Joint",
                                    "grouping": "top_level_legal_entity",
                                    "columns": {
                                        "time_weighted_return": null,
                                        "inception_event_date": "2021-12-21",
                                        "_custom_portfolio_target_347209": "N/A",
                                        "time_weighted_return_3": null,
                                        "time_weighted_return_2": null,
                                        "value": -1.44,
                                        "node_id": "9957007"
                                    },
                                    "children": []
                                },
                                {
                                    "entity_id": 8413655,
                                    "name": "FW Holdings No. 2 LLC",
                                    "grouping": "top_level_legal_entity",
                                    "columns": {
                                        "time_weighted_return": -0.0077309266066708515,
                                        "inception_event_date": "2021-03-05",
                                        "_custom_portfolio_target_347209": "N/A",
                                        "time_weighted_return_3": 0.11844843557716445,
                                        "time_weighted_return_2": -0.0077309266066708515,
                                        "value": 6768936.74,
                                        "node_id": "8413655"
                                    },
                                    "children": []
                                },
                                {
                                    "entity_id": 13396796,
                                    "name": "FW Liquid Fund LP",
                                    "grouping": "top_level_legal_entity",
                                    "columns": {
                                        "time_weighted_return": -0.04149769229150746,
                                        "inception_event_date": "2021-12-30",
                                        "_custom_portfolio_target_347209": "Aggressive",
                                        "time_weighted_return_3": -0.041497430478377395,
                                        "time_weighted_return_2": -0.04149769229150746,
                                        "value": 5.289952672686747E7,
                                        "node_id": "13396796"
                                    },
                                    "children": []
                                },
                                {
                                    "entity_id": 13014080,
                                    "name": "The FW Family Trust",
                                    "grouping": "top_level_legal_entity",
                                    "columns": {
                                        "time_weighted_return": -0.06102013456998856,
                                        "inception_event_date": "2021-04-09",
                                        "_custom_portfolio_target_347209": "Aggressive",
                                        "time_weighted_return_3": -0.039685671858585514,
                                        "time_weighted_return_2": -0.06102013456998856,
                                        "value": 475355.59242999996,
                                        "node_id": "13014080"
                                    },
                                    "children": []
                                }
                            ]
                        },
                        {
                            "name": "Portfolio_1",
                            "grouping": "_custom_new_entity_group_453577",
                            "columns": {
                                "time_weighted_return": -0.44554958179309,
                                "inception_event_date": "2021-04-07",
                                "_custom_portfolio_target_347209": "N/A",
                                "time_weighted_return_3": -0.44554958179309,
                                "time_weighted_return_2": -0.44554958179309,
                                "value": 260786.03,
                                "node_id": null
                            },
                            "children": [
                                {
                                    "entity_id": 9552252,
                                    "name": "The FW Irrev Family Tr",
                                    "grouping": "top_level_legal_entity",
                                    "columns": {
                                        "time_weighted_return": 0.0,
                                        "inception_event_date": "2022-01-11",
                                        "_custom_portfolio_target_347209": "N/A",
                                        "time_weighted_return_3": 0.0,
                                        "time_weighted_return_2": 0.0,
                                        "value": 260786.03,
                                        "node_id": "9552252"
                                    },
                                    "children": []
                                }
                            ]
                        },
                        {
                            "name": "Portfolio_2",
                            "grouping": "_custom_new_entity_group_453577",
                            "columns": {
                                "time_weighted_return": -0.05780354507057972,
                                "inception_event_date": "2021-09-03",
                                "_custom_portfolio_target_347209": "Growth",
                                "time_weighted_return_3": -0.05470214863844658,
                                "time_weighted_return_2": -0.05780354507057972,
                                "value": 1.8396664156520825E7,
                                "node_id": null
                            },
                            "children": [
                                {
                                    "entity_id": 10946585,
                                    "name": "FW DAF",
                                    "grouping": "top_level_legal_entity",
                                    "columns": {
                                        "time_weighted_return": -0.05780354507057972,
                                        "inception_event_date": "2021-09-03",
                                        "_custom_portfolio_target_347209": "Growth",
                                        "time_weighted_return_3": -0.05470214863844658,
                                        "time_weighted_return_2": -0.05780354507057972,
                                        "value": 1.8396664156520832E7,
                                        "node_id": "10946585"
                                    },
                                    "children": []
                                }
                            ]
                        }
                    ]
                }
            }
        },
        "included": []
    }
    

    我的代码: 这是我构建的函数,用于尝试规范JSON响应并保存在pandas数据帧中-

    def unpack_response():
        while True:
            try:    
                api_response = response_writer()
                df = pd.json_normalize(api_response['data']['attributes']['total']['children'])
                df.columns = df.columns.str.replace(r'columns.', '', regex=False)
                column_name_mapper = {column['key']: column['display_name'] for column in api_response['meta']['columns']}
                df.rename(columns=column_name_mapper, inplace=True)
                break
            except KeyError:
                print("-----------------------------------\n","API TIMEOUT ERROR: TRYING AGAIN...", "\n-----------------------------------\n")
        
        df.rename(columns={'name': 'New Entity Group'}, inplace=True)
    
        column_names = ["New Entity Group", "Entity ID", "Adjusted Value (1/31/2022, No Div, USD)", "Adjusted TWR (Current Quarter, No Div, USD)", "Adjusted TWR (YTD, No Div, USD)", "Annualized Adjusted TWR (Since Inception, No Div, USD)", "Inception Date"]
        df = df.reindex(columns=column_names)
        
        return df
    unpack_response()
    

    关于我的代码的评论:

    • 投资组合_1、投资组合_2、投资组合_3 -这些粗体行是 children 属于 data 并且似乎是唯一保存到 df 。我想这是因为我的代码引用 df = pd.json_normalize(api_response['data']['attributes']['total']['children']) 只看这些列表也是如此。我试着只是附加 ['children']['children'] 到代码片段的末尾(给定 儿童 ,但收到 TypeError: list indices must be integers or slices, not str

    如果有任何关于如何改进或添加我的功能的建议,我将不胜感激,这样我就可以利用关键:对值,这是 儿童 水平。

    0 回复  |  直到 3 年前
        1
  •  2
  •   ti7    3 年前

    这看起来像是你试图创建并堆叠三个数据帧,你可能并不想这样做,或者通过映射每个数据帧可以更好地实现 Porpfolio_ 到每条相关线路,然后

    import itertools
    ...
    portfolio_views_children = response['data']['attributes']['total']['children']
    
    portfolios = []
    for portfolio in portfolio_views_children:
        entity_columns = []
        # include portfolio itself within an iterable so the total is the header
        for entity in itertools.chain([portfolio], portfolio["children"]):
            entity_data = entity["columns"].copy()  # don't mutate original response
            entity_data["portfolio"] = portfolio["name"]   # from outer
            entity_data["name"]      = entity["name"]
            entity_columns.append(entity_data)
    
        df = pd.DataFrame(entity_columns)
        portfolios.append(df)
    
    # combine dataframes
    df = pd.concat(portfolios)
    # reorder and rename
    column_ordering = {"portfolio": "portfolio", "name": "name"}
    column_ordering.update({c["key"]: c["display_name"] for c in response["meta"]["columns"]})
    df = df[column_ordering.keys()]   # beware: un-named cols will be dropped
    df = df.rename(columns=column_ordering)
    
    print(df.to_markdown(index=False))  # create output below (requires tabulate)
    
    文件夹 名称 实体ID 调整后的价值(2022年1月31日,无分割,美元) 调整后的TWR(本季度,无分部,美元) 调整后的TWR(本年迄今,无分部,美元) 年化调整后TWR(自成立以来,无拆分,美元) 起始日期 风险目标
    投资组合_3 投资组合_3 6.01438e+07 -0.0442006 -0.0442006 0.0774833 2020-12-17
    投资组合_3 FW和FR接头 9957007 -1.44 nan nan nan 2021-12-21 不适用
    投资组合_3 FW控股第二有限责任公司 8413655 6.76894e+06 -0.00773093 -0.00773093 0.118448 2021-03-05 不适用
    投资组合_3 FW流动基金LP 13396796 5.28995e+07 -0.0414977 -0.0414977 -0.0414974 2021-12-30 侵略性的
    投资组合_3 FW家族信托 13014080 475356 -0.0610201 -0.0610201 -0.0396857 2021-04-09 侵略性的
    投资组合_1 投资组合_1 260786 -0.44555 -0.44555 -0.44555 2021-04-07 不适用
    投资组合_1 FW Irrev家族Tr 9552252 260786 0 0 0 2022-01-11 不适用
    投资组合_2 投资组合_2 1.83967e+07 -0.0578035 -0.0578035 -0.0547021 2021-09-03 发育
    投资组合_2 FW-DAF 10946585 1.83967e+07 -0.0578035 -0.0578035 -0.0547021 2021-09-03 发育
        2
  •  2
  •   Emma    3 年前

    自从你 children 儿童 具有与相同的结构 儿童 ,您可以尝试使用 json_normalize 分别两次并将其附加在一起。

    # For first layer that includes Portfolio_1, Portfolio_2, Portfolio_3
    df = pd.json_normalize(s, record_path=['data', 'attributes', 'total', 'children'])
    
    # For second layer that includes The FW Irrev Family Tr, etc
    # Use explode to convert list into rows
    df_child = pd.json_normalize(df.explode('children').children)
    
    # Combine both
    df = pd.concat([df, df_child])
    
    # You can use your column renaming and filtering 
    
        3
  •  1
  •   user17242583 user17242583    3 年前

    就我个人而言,我不会使用 pd.json_normalize 对于这种情况。您的JSON非常复杂,除非您对 json_normalize ,对于普通开发人员来说,下面的代码可能需要更少的时间来理解。事实上,你甚至不需要看到JSON就可以确切地理解这段代码的作用(尽管它肯定会有所帮助;)。

    首先,我们可以将JSON中的对象(公文包及其子对象)提取到一个列表中,并使用一系列步骤使它们具有正确的形式和顺序:

    def prep_obj(o):
        """Prepares an object (portfolio/child) from the JSON to be inserted into a dataframe."""
        return {
            'New Entity Group': o['name'],
        } | o['columns']
    
    
    # Get a list of lists, where each sub-list contains the portfolio object at index 0 and then the portfolio object's children:
    groups = [[prep_obj(o), *[prep_obj(child) for child in o['children']]] for o in api_response['data']['attributes']['total']['children']]
    
    # Sort the portfolio groups by their number:
    groups.sort(key=lambda g: int(g[0]['New Entity Group'].split('_')[1]))
    
    # Reverse the children of each portfolio group:
    groups = [[g[0]] + g[1:][::-1] for g in groups]
    
    # Flatten out the groups into one large list of objects:
    objects = [obj for group in groups for obj in group]
    # The above is exactly equivalent to the following:
    #   objects = []
    #   for group in groups:
    #       for obj in group:
    #           objects.append(obj)
    

    接下来,创建数据帧:

    # Create a mapping for column names so that their display names can be used:
    mapping = {col['key']: col['display_name'] for col in api_response['meta']['columns']}
    
    # Create a dataframe from the list of objects:
    df = pd.DataFrame(objects)
    
    # Correct column names:
    df = df.rename(mapping, axis=1)
    # Reorder columns:
    column_names = ["New Entity Group", "Entity ID", "Adjusted Value (1/31/2022, No Div, USD)", "Adjusted TWR (Current Quarter, No Div, USD)", "Adjusted TWR (YTD, No Div, USD)", "Annualized Adjusted TWR (Since Inception, No Div, USD)", "Inception Date", "Risk Target"]
    df = df[column_names]
    

    以及格式:

    def format_twr_col(col):
        return (
            col
            .abs()
            .mul(100)
            .round(2)
            .pipe(lambda s: s.where(s.eq(0) | s.isna(), '(' + s.astype(str) + '%)'))
            .pipe(lambda s: s.where(s.ne(0) | s.isna(), s.astype(str) + '%'))
            .fillna('-')
        )
    
    def format_value_col(col):
        positive_mask = col.ge(0)
    
        col[positive_mask] = (
            col[positive_mask]
            .round()
            .astype(int)
            .map('${:,}'.format)
        )
    
        col[~positive_mask] = (
            col[~positive_mask]
            .astype(float)
            .round()
            .astype(int)
            .abs()
            .map('(${:,})'.format)
        )
        
        return col
    
    df['Adjusted TWR (Current Quarter, No Div, USD)'] = format_twr_col(df['Adjusted TWR (Current Quarter, No Div, USD)'])
    df['Annualized Adjusted TWR (Since Inception, No Div, USD)'] = format_twr_col(df['Annualized Adjusted TWR (Since Inception, No Div, USD)'])
    df['Adjusted TWR (YTD, No Div, USD)'] = format_twr_col(df['Adjusted TWR (YTD, No Div, USD)'])
    
    df['Adjusted Value (1/31/2022, No Div, USD)'] = format_value_col(df['Adjusted Value (1/31/2022, No Div, USD)'].copy())
    
    df['Inception Date'] = pd.to_datetime(df['Inception Date']).dt.strftime('%b %d, %Y')
    
    df['Entity ID'] = df['Entity ID'].fillna('')
    

    和voil:

    >>> pd.options.display.max_columns = None
    >>> df
             New Entity Group Entity ID Adjusted Value (1/31/2022, No Div, USD)  Adjusted TWR (Current Quarter, No Div, USD) Adjusted TWR (YTD, No Div, USD)  Annualized Adjusted TWR (Since Inception, No Div, USD) Inception Date  Risk Target
    0             Portfolio_1                                          $260,786                                     (44.55%)                        (44.55%)                                            (44.55%)       Apr 07, 2021          N/A
    1  The FW Irrev Family Tr   9552252                                $260,786                                         0.0%                            0.0%                                                0.0%       Jan 11, 2022          N/A
    2             Portfolio_2                                       $18,396,664                                      (5.78%)                         (5.78%)                                             (5.47%)       Sep 03, 2021       Growth
    3                  FW DAF  10946585                             $18,396,664                                      (5.78%)                         (5.78%)                                             (5.47%)       Sep 03, 2021       Growth
    4             Portfolio_3                                       $60,143,818                                      (4.42%)                         (4.42%)                                             (7.75%)       Dec 17, 2020          NaN
    5     The FW Family Trust  13014080                                $475,356                                       (6.1%)                          (6.1%)                                             (3.97%)       Apr 09, 2021   Aggressive
    6       FW Liquid Fund LP  13396796                             $52,899,527                                      (4.15%)                         (4.15%)                                             (4.15%)       Dec 30, 2021   Aggressive
    7   FW Holdings No. 2 LLC   8413655                              $6,768,937                                      (0.77%)                         (0.77%)                                            (11.84%)       Mar 05, 2021          N/A
    8         FW and FR Joint   9957007                                    ($1)                                            -                               -                                                   -       Dec 21, 2021          N/A
    
        4
  •  1
  •   quasi-human    3 年前

    我更喜欢使用 json_normalize 以下代码不涉及错误处理、详细格式等,而我认为其中包含了您最想做的事情的本质。

    代码:

    import json
    import pandas as pd
    
    # You have to change this path according to the actual json file location.
    with open('./api_response.json', 'r') as f:
        api_response = json.load(f)
    
    def unpack_response(r):
        df = pd.DataFrame()
    
        df_src = pd.json_normalize(r, record_path=['data', 'attributes', 'total', 'children'])
        for _, row in df_src.sort_values('name').iterrows(): 
            df_p = pd.DataFrame(row).T
            df_c = pd.json_normalize(row.children)
    
            # I'm not sure what your expected sorting order is. Perhaps you might want to delete the next line.
            df_c = df_c.sort_values(['columns._custom_portfolio_target_347209', 'columns.inception_event_date'])
    
            df = pd.concat([df, df_p, df_c], axis=0, ignore_index=True)
    
        column_name_mapper = {'columns.' + column['key']: column['display_name'] for column in api_response['meta']['columns']}
        column_name_mapper.update({'name': 'New Entity Group'})
        column_names = ["New Entity Group", "Entity ID", "Adjusted Value (1/31/2022, No Div, USD)", "Adjusted TWR (Current Quarter, No Div, USD)", "Adjusted TWR (YTD, No Div, USD)", "Annualized Adjusted TWR (Since Inception, No Div, USD)", "Inception Date", "Risk Target"]
        df = df.rename(columns=column_name_mapper).reindex(columns=column_names)
    
        return df
    
    df = unpack_response(api_response)
    

    输出:

    新建实体组 实体ID 调整后的价值(2022年1月31日,无分割,美元) 调整后的TWR(本季度,无分部,美元) 调整后的TWR(本年迄今,无分部,美元) 年化调整后TWR(自成立以来,无拆分,美元) 起始日期 风险目标
    投资组合_1 260786 -0.44555 -0.44555 -0.44555 2021-04-07 不适用
    FW Irrev家族Tr 9552252 260786 0 0 0 2022-01-11 不适用
    投资组合_2 1.83967e+07 -0.0578035 -0.0578035 -0.0547021 2021-09-03 发育
    FW-DAF 10946585 1.83967e+07 -0.0578035 -0.0578035 -0.0547021 2021-09-03 发育
    投资组合_3 6.01438e+07 -0.0442006 -0.0442006 0.0774833 2020-12-17
    FW家族信托 13014080 475356 -0.0610201 -0.0610201 -0.0396857 2021-04-09 侵略性的
    FW流动基金LP 13396796 5.28995e+07 -0.0414977 -0.0414977 -0.0414974 2021-12-30 侵略性的
    FW控股第二有限责任公司 8413655 6.76894e+06 -0.00773093 -0.00773093 0.118448 2021-03-05 不适用
    FW和FR接头 9957007 -1.44 nan nan nan 2021-12-21 不适用