代码之家  ›  专栏  ›  技术社区  ›  AlexW

python-itertools groupby,但只包括新列表中的组。那么过滤列表呢?

  •  0
  • AlexW  · 技术社区  · 6 年前

    我有两个字典列表,其中包含以下示例数据:

    列表1:

    list_1 = [
        {
            "route": "10.10.4.0",
            "mask": "255.255.255.0",
            "next_hop": "172.18.1.5"
        },
        {
            "route": "10.10.5.0",
            "mask": "255.255.255.0",
            "next_hop": "172.18.1.5"
        },
        {
            "route": "10.10.8.0",
            "mask": "255.255.255.0",
            "next_hop": "172.16.66.34"
        },
        {
            "route": "10.10.58.0",
            "mask": "255.255.255.0",
            "next_hop": "172.18.1.5"
        },
        {
            "route": "172.18.12.4",
            "mask": "255.255.255.252",
            "next_hop": "172.18.1.5"
        }
    ]
    

    表2

    list_2 = [
        {
            "route": "10.10.4.0",
            "site": "Edinburgh"
        },
        {
            "route": "10.10.8.0",
            "site": "Manchester"
        },
        {
            "route": "10.10.5.0",
            "site": "London"
        },
    ]
    

    我将这些列表与iTreols合并,如下所示

    temp_merged_data = sorted(itertools.chain(list_1, list_2), key=lambda x:x['route'])
    route_data = []
    for k,v in itertools.groupby(temp_merged_data, key=lambda x:x['route']):
        d = {}
        for dct in v:
            d.update(dct)
        route_data.append(d) 
    

    它返回下面的内容,但是我不想在那里有任何没有站点的路径,我该如何实现这一点?当我有字典/json的最后一个列表时,我如何有效地过滤这个列表,例如,如果我只想知道下一个到伦敦的跃点?

    谢谢

    [
        {
            "route": "10.10.4.0",
            "mask": "255.255.255.0",
            "next_hop": "172.18.1.5",
            "site": "Edinburgh"
        },
        {
            "route": "10.10.5.0",
            "mask": "255.255.255.0",
            "next_hop": "172.18.1.5",
            "site": "London"
        },
        {
            "route": "10.10.58.0",
            "mask": "255.255.255.0",
            "next_hop": "172.18.1.5"
        },
        {
            "route": "10.10.8.0",
            "mask": "255.255.255.0",
            "next_hop": "172.16.66.34",
            "site": "Manchester"
        },
        {
            "route": "172.18.12.4",
            "mask": "255.255.255.252",
            "next_hop": "172.18.1.5"
        }
    ]
    
    6 回复  |  直到 6 年前
        1
  •  2
  •   Ashish Acharya    6 年前

    熊猫的解决方案是:

    In [18]: df1=pd.DataFrame(list_1)
    
    In [19]: df2=pd.DataFrame(list_2)    
    
    In [22]: df1.merge(df2, on='route', how='left')
    Out[22]: 
                  mask      next_hop        route        site
    0    255.255.255.0    172.18.1.5    10.10.4.0   Edinburgh
    1    255.255.255.0    172.18.1.5    10.10.5.0      London
    2    255.255.255.0  172.16.66.34    10.10.8.0  Manchester
    3    255.255.255.0    172.18.1.5   10.10.58.0         NaN
    4  255.255.255.252    172.18.1.5  172.18.12.4         NaN
    

    过滤掉没有站点的路由,如下所示:

    In [29]: merged=df1.merge(df2, on='route', how='left')
    In [31]: df=merged[~merged.site.isna()]
    Out[31]: 
                mask      next_hop      route        site
    0  255.255.255.0    172.18.1.5  10.10.4.0   Edinburgh
    1  255.255.255.0    172.18.1.5  10.10.5.0      London
    2  255.255.255.0  172.16.66.34  10.10.8.0  Manchester
    

    仅适用于爱丁堡的过滤器:

    df[df['site']=='Edinburgh']
    

    要获得您的格式:

    [v for k, v in df.T.to_dict().items()]
    

    输出:

    [{'mask': '255.255.255.0',
      'next_hop': '172.18.1.5',
      'route': '10.10.4.0',
      'site': 'Edinburgh'},
     {'mask': '255.255.255.0',
      'next_hop': '172.18.1.5',
      'route': '10.10.5.0',
      'site': 'London'},
     {'mask': '255.255.255.0',
      'next_hop': '172.16.66.34',
      'route': '10.10.8.0',
      'site': 'Manchester'}]
    
        2
  •  0
  •   Rakesh    6 年前
    import itertools
    temp_merged_data = sorted(itertools.chain(list_1, list_2), key=lambda x:x['route'])
    route_data = []
    for k,v in itertools.groupby(temp_merged_data, key=lambda x:x['route']):
        d = {}
        for dct in v:
            if "site" in dct.keys():   #Check if site is in keys
                d.update(dct)
        if d:
            route_data.append(d)
    print(route_data)
    

    输出:

    [{'route': '10.10.4.0', 'site': 'Edinburgh'}, {'route': '10.10.5.0', 'site': 'London'}, {'route': '10.10.8.0', 'site': 'Manchester'}]
    
        3
  •  0
  •   Ajax1234    6 年前

    您可以筛选结果:

    d = [{'route': '10.10.4.0', 'mask': '255.255.255.0', 'next_hop': '172.18.1.5', 'site': 'Edinburgh'}, {'route': '10.10.5.0', 'mask': '255.255.255.0', 'next_hop': '172.18.1.5', 'site': 'London'}, {'route': '10.10.58.0', 'mask': '255.255.255.0', 'next_hop': '172.18.1.5'}, {'route': '10.10.8.0', 'mask': '255.255.255.0', 'next_hop': '172.16.66.34', 'site': 'Manchester'}, {'route': '172.18.12.4', 'mask': '255.255.255.252', 'next_hop': '172.18.1.5'}]
    new_d = [i for i in d if i.get('site')]
    

    输出:

    [{'route': '10.10.4.0', 'mask': '255.255.255.0', 'next_hop': '172.18.1.5', 'site': 'Edinburgh'}, {'route': '10.10.5.0', 'mask': '255.255.255.0', 'next_hop': '172.18.1.5', 'site': 'London'}, {'route': '10.10.8.0', 'mask': '255.255.255.0', 'next_hop': '172.16.66.34', 'site': 'Manchester'}]
    
        4
  •  0
  •   Graipher    6 年前

    使用实际的数据分析工具,例如 pandas :

    import pandas as pd
    
    df1 = pd.DataFrame(list_1)
    df2 = pd.DataFrame(list_2)
    
    print(df1.merge(df2))
    #             mask      next_hop      route        site
    # 0  255.255.255.0    172.18.1.5  10.10.4.0   Edinburgh
    # 1  255.255.255.0    172.18.1.5  10.10.5.0      London
    # 2  255.255.255.0  172.16.66.34  10.10.8.0  Manchester
    
        5
  •  0
  •   Sunitha    6 年前
    >>> from itertools import groupby, chain
    >>> temp_merged_data  = sorted(chain(list_1, list_2), key=lambda x:x['route'])
    >>> route_data = [dict(chain(*map(dict.items, v))) for k,v in groupby(temp_merged_data, key=lambda x:x['route'])]
    >>> route_data = [d for d in route_data if 'site' in d]
    >>> pprint (route_data)
    [{'mask': '255.255.255.0',
      'next_hop': '172.18.1.5',
      'route': '10.10.4.0',
      'site': 'Edinburgh'},
     {'mask': '255.255.255.0',
      'next_hop': '172.18.1.5',
      'route': '10.10.5.0',
      'site': 'London'},
     {'mask': '255.255.255.0',
      'next_hop': '172.16.66.34',
      'route': '10.10.8.0',
      'site': 'Manchester'}]
    

    现在,如果将路由数据转换为 dict 更容易访问每个站点的参数

    >>> route_dict = {d['site']:d for d in route_data}
    >>> route_dict['London']['next_hop']
    '172.18.1.5'
    
        6
  •  0
  •   VPfB    6 年前

    考虑到这些列表的结构(路由信息和路由站点),我认为不需要合并和分组。

    routes_to_sites = {rs['route']: rs['site'] for rs in list_2}
    route_data = []
    for ri in list_1:
        site = routes_to_sites.get(ri['route'])
        if site is not None:
            route_data.append({**ri, 'site': site})