代码之家 › 专栏 › 技术社区 › AlexW

python-itertools groupby,但只包括新列表中的组。那么过滤列表呢?

itertools python

AlexW · 技术社区 · 6 年前

我有两个字典列表,其中包含以下示例数据:

列表1:

list_1 = [
    {
        "route": "10.10.4.0",
        "mask": "255.255.255.0",
        "next_hop": "172.18.1.5"
    },
    {
        "route": "10.10.5.0",
        "mask": "255.255.255.0",
        "next_hop": "172.18.1.5"
    },
    {
        "route": "10.10.8.0",
        "mask": "255.255.255.0",
        "next_hop": "172.16.66.34"
    },
    {
        "route": "10.10.58.0",
        "mask": "255.255.255.0",
        "next_hop": "172.18.1.5"
    },
    {
        "route": "172.18.12.4",
        "mask": "255.255.255.252",
        "next_hop": "172.18.1.5"
    }
]

表2

list_2 = [
    {
        "route": "10.10.4.0",
        "site": "Edinburgh"
    },
    {
        "route": "10.10.8.0",
        "site": "Manchester"
    },
    {
        "route": "10.10.5.0",
        "site": "London"
    },
]

我将这些列表与iTreols合并,如下所示

temp_merged_data = sorted(itertools.chain(list_1, list_2), key=lambda x:x['route'])
route_data = []
for k,v in itertools.groupby(temp_merged_data, key=lambda x:x['route']):
    d = {}
    for dct in v:
        d.update(dct)
    route_data.append(d)

它返回下面的内容,但是我不想在那里有任何没有站点的路径,我该如何实现这一点?当我有字典/json的最后一个列表时,我如何有效地过滤这个列表,例如,如果我只想知道下一个到伦敦的跃点?

谢谢

[
    {
        "route": "10.10.4.0",
        "mask": "255.255.255.0",
        "next_hop": "172.18.1.5",
        "site": "Edinburgh"
    },
    {
        "route": "10.10.5.0",
        "mask": "255.255.255.0",
        "next_hop": "172.18.1.5",
        "site": "London"
    },
    {
        "route": "10.10.58.0",
        "mask": "255.255.255.0",
        "next_hop": "172.18.1.5"
    },
    {
        "route": "10.10.8.0",
        "mask": "255.255.255.0",
        "next_hop": "172.16.66.34",
        "site": "Manchester"
    },
    {
        "route": "172.18.12.4",
        "mask": "255.255.255.252",
        "next_hop": "172.18.1.5"
    }
]

6 回复 | 直到 6 年前

Ashish Acharya 6 年前

熊猫的解决方案是:

In [18]: df1=pd.DataFrame(list_1)

In [19]: df2=pd.DataFrame(list_2)    

In [22]: df1.merge(df2, on='route', how='left')
Out[22]: 
              mask      next_hop        route        site
0    255.255.255.0    172.18.1.5    10.10.4.0   Edinburgh
1    255.255.255.0    172.18.1.5    10.10.5.0      London
2    255.255.255.0  172.16.66.34    10.10.8.0  Manchester
3    255.255.255.0    172.18.1.5   10.10.58.0         NaN
4  255.255.255.252    172.18.1.5  172.18.12.4         NaN

过滤掉没有站点的路由,如下所示:

In [29]: merged=df1.merge(df2, on='route', how='left')
In [31]: df=merged[~merged.site.isna()]
Out[31]: 
            mask      next_hop      route        site
0  255.255.255.0    172.18.1.5  10.10.4.0   Edinburgh
1  255.255.255.0    172.18.1.5  10.10.5.0      London
2  255.255.255.0  172.16.66.34  10.10.8.0  Manchester

仅适用于爱丁堡的过滤器:

df[df['site']=='Edinburgh']

要获得您的格式:

[v for k, v in df.T.to_dict().items()]

输出:

[{'mask': '255.255.255.0',
  'next_hop': '172.18.1.5',
  'route': '10.10.4.0',
  'site': 'Edinburgh'},
 {'mask': '255.255.255.0',
  'next_hop': '172.18.1.5',
  'route': '10.10.5.0',
  'site': 'London'},
 {'mask': '255.255.255.0',
  'next_hop': '172.16.66.34',
  'route': '10.10.8.0',
  'site': 'Manchester'}]

Rakesh 6 年前

import itertools
temp_merged_data = sorted(itertools.chain(list_1, list_2), key=lambda x:x['route'])
route_data = []
for k,v in itertools.groupby(temp_merged_data, key=lambda x:x['route']):
    d = {}
    for dct in v:
        if "site" in dct.keys():   #Check if site is in keys
            d.update(dct)
    if d:
        route_data.append(d)
print(route_data)

输出:

[{'route': '10.10.4.0', 'site': 'Edinburgh'}, {'route': '10.10.5.0', 'site': 'London'}, {'route': '10.10.8.0', 'site': 'Manchester'}]

Ajax1234 6 年前

您可以筛选结果:

d = [{'route': '10.10.4.0', 'mask': '255.255.255.0', 'next_hop': '172.18.1.5', 'site': 'Edinburgh'}, {'route': '10.10.5.0', 'mask': '255.255.255.0', 'next_hop': '172.18.1.5', 'site': 'London'}, {'route': '10.10.58.0', 'mask': '255.255.255.0', 'next_hop': '172.18.1.5'}, {'route': '10.10.8.0', 'mask': '255.255.255.0', 'next_hop': '172.16.66.34', 'site': 'Manchester'}, {'route': '172.18.12.4', 'mask': '255.255.255.252', 'next_hop': '172.18.1.5'}]
new_d = [i for i in d if i.get('site')]

输出:

[{'route': '10.10.4.0', 'mask': '255.255.255.0', 'next_hop': '172.18.1.5', 'site': 'Edinburgh'}, {'route': '10.10.5.0', 'mask': '255.255.255.0', 'next_hop': '172.18.1.5', 'site': 'London'}, {'route': '10.10.8.0', 'mask': '255.255.255.0', 'next_hop': '172.16.66.34', 'site': 'Manchester'}]

Graipher 6 年前

使用实际的数据分析工具,例如 pandas :

import pandas as pd

df1 = pd.DataFrame(list_1)
df2 = pd.DataFrame(list_2)

print(df1.merge(df2))
#             mask      next_hop      route        site
# 0  255.255.255.0    172.18.1.5  10.10.4.0   Edinburgh
# 1  255.255.255.0    172.18.1.5  10.10.5.0      London
# 2  255.255.255.0  172.16.66.34  10.10.8.0  Manchester

Sunitha 6 年前

>>> from itertools import groupby, chain
>>> temp_merged_data  = sorted(chain(list_1, list_2), key=lambda x:x['route'])
>>> route_data = [dict(chain(*map(dict.items, v))) for k,v in groupby(temp_merged_data, key=lambda x:x['route'])]
>>> route_data = [d for d in route_data if 'site' in d]
>>> pprint (route_data)
[{'mask': '255.255.255.0',
  'next_hop': '172.18.1.5',
  'route': '10.10.4.0',
  'site': 'Edinburgh'},
 {'mask': '255.255.255.0',
  'next_hop': '172.18.1.5',
  'route': '10.10.5.0',
  'site': 'London'},
 {'mask': '255.255.255.0',
  'next_hop': '172.16.66.34',
  'route': '10.10.8.0',
  'site': 'Manchester'}]

现在,如果将路由数据转换为 dict 更容易访问每个站点的参数

>>> route_dict = {d['site']:d for d in route_data}
>>> route_dict['London']['next_hop']
'172.18.1.5'

VPfB 6 年前

考虑到这些列表的结构(路由信息和路由站点),我认为不需要合并和分组。

routes_to_sites = {rs['route']: rs['site'] for rs in list_2}
route_data = []
for ri in list_1:
    site = routes_to_sites.get(ri['route'])
    if site is not None:
        route_data.append({**ri, 'site': site})