代码之家 › 专栏 › 技术社区 › LarsH

在python中扁平化列表的字典(2级深度)

dictionary mapreduce data-structures python

LarsH · 技术社区 · 14 年前

我想把我的大脑包起来,但它不够灵活。

在我的python脚本中,我有一个列表字典字典。(事实上,它会变得更深一点,但这个层次不涉及这个问题。)我想把所有这些放在一个长列表中,扔掉所有字典键。

所以我想改变

{1: {'a': [1, 2, 3], 'b': [0]},
 2: {'c': [4, 5, 1], 'd': [3, 8]}}

到

[1, 2, 3, 0, 4, 5, 1, 3, 8]

我可以设置一个map reduce来迭代外部字典中的项目,从每个子分区构建一个子列表,然后将所有子列表连接在一起。

但对于大型数据集来说,这似乎效率低下,因为中间数据结构(子列表)将被丢弃。有办法一次就完成吗?

除此之外,我很乐意接受一个两级的实现,它可以工作…我的地图册生锈了!

更新: 对于那些感兴趣的人,下面是我最终使用的代码。

def genSessions(d):
    """Given the ipDict, return an iterator that provides all the sessions,
    one by one, converted to tuples."""
    for uaDict in d.itervalues():
        for sessions in uaDict.itervalues():
            for session in sessions:
                yield tuple(session)

# Flatten dict of dicts of lists of sessions into a list of sessions.
# Sort that list by start time
sessionsByStartTime = sorted(genSessions(ipDict), key=operator.itemgetter(0))
# Then make another copy sorted by end time.
sessionsByEndTime = sorted(sessionsByStartTime, key=operator.itemgetter(1))

3 回复 | 直到 13 年前

Community rohancragg 7 年前

posted

def flatten(d):
    """Recursively flatten dictionary values in `d`.

    >>> hat = {'cat': ['images/cat-in-the-hat.png'],
    ...        'fish': {'colours': {'red': [0xFF0000], 'blue': [0x0000FF]},
    ...                 'numbers': {'one': [1], 'two': [2]}},
    ...        'food': {'eggs': {'green': [0x00FF00]},
    ...                 'ham': ['lean', 'medium', 'fat']}}
    >>> set_of_values = set(flatten(hat))
    >>> sorted(set_of_values)
    [1, 2, 255, 65280, 16711680, 'fat', 'images/cat-in-the-hat.png', 'lean', 'medium']
    """
    try:
        for v in d.itervalues():
            for nested_v in flatten(v):
                yield nested_v
    except AttributeError:
        for list_v in d:
            yield list_v

set

. 如果你想追踪它,只要通过 flatten(hat) 其他功能而不是 设置 . 在Python2.7中,另一个函数可以是 collections.Counter . 为了与进化程度较低的蟒蛇兼容,您可以编写自己的函数,或者(在效率降低的情况下)结合使用 sorted 具有 itertools.groupby .

Alex Martelli 14 年前

我希望你能意识到你在口述中看到的任何顺序都是偶然的——只有在屏幕上显示的时候, 一些必须挑选订单,但绝对不能保证。

各子列表之间的排序问题网

[x for d in thedict.itervalues()
   for alist in d.itervalues()
   for x in alist]

做你想做的,没有任何效率低下或中间清单。

Escualo 14 年前

递归函数可以工作:

def flat(d, out=[]):
 for val in d.values():
  if isinstance(val, dict):
    flat(d, out)
  else:
    out+= val

如果您尝试使用:

>>> d = {1: {'a': [1, 2, 3], 'b': [0]}, 2: {'c': [4, 5, 6], 'd': [3, 8]}}
>>> out = []
>>> flat(d, out)
>>> print out
[1, 2, 3, 0, 4, 5, 6, 3, 8]

注意字典没有顺序,所以列表是随机的。

你也可以 return out (在循环结束时)不要用list参数调用函数。

def flat(d, out=[]):
 for val in d.values():
  if isinstance(val, dict):
    flat(d, out)
  else:
    out+= val
 return out

致电:

my_list = flat(d)