代码之家 › 专栏 › 技术社区 › radek

如何在Python中重塑和聚合元组列表?

python

radek · 技术社区 · 14 年前

从psycopg2查询中,我得到了一个元组列表形式的结果,如下所示:

[(1, 0), (1, 0), (1, 1), (2, 1), (2, 2), (2, 2), (2, 2)]

每个元组表示事件发生位置的id和事件发生的时间。

[(1, 0, 2), (1, 1, 1), (1, 2, 0), (2, 0, 0), (2, 1, 1), (2, 3, 3)]

每个人都会告诉我,例如: 在位置1,0时有2个事件; 在位置1,在第1小时有1个事件;

如果在某个时间有0个事件,我仍然希望看到它,例如在位置2的0小时有0个事件:(2,0,0)

如何用Python实现它?

编辑:谢谢你的帮助!

2 回复 | 直到 14 年前

Kylotan 14 年前

如果您是从数据库中获取的,为什么不让查询首先执行它呢?比如: SELECT hour, location, COUNT(*) FROM events GROUP BY hour, location ORDER BY hour, location .

在Python中,可能是这样的:

timed_events = {}
# Count them up
for event in events_from_database:
    timed_events[event] = timed_events.setdefault(event, 0) + 1

# Form a new list with the original data plus the count
aggregate_list = [(evt[0], evt[1], count) for evt,count in events.items()]

Alex Martelli 14 年前

有点像……:

import collections

raw_data = [(1, 0), (1, 0), (1, 1), (2, 1), (2, 2), (2, 2), (2, 2)]
aux = collections.defaultdict(int)
for x, y in raw_data:
  aux[x, y] += 1

locations = sorted(set(x for x, y in raw_data))
hours = sorted(set(y for x, y in raw_data))
result = [(x, y, aux[x, y]) for x in locations for y in hours]

如果您希望位置和时间反映原始数据中的内容。你可能想用 range(some, thing) 对于每个地点和时间,如果你有关于地点和时间应该跨越的范围的独立信息,那么你就应该这样做,与实际发生的时间和地点完全不同 raw_data .