代码之家 › 专栏 › 技术社区 › N08

GroupBy和扁平列表

pandas python

N08 · 技术社区 · 6 年前

我有一个熊猫数据框架,格式如下:

import pandas as pd
p = pd.DataFrame({"int" : [1,     1,     1,     1,     2,      2],
                  "cod" : [[1,1], [2,2], [1,2], [3,9], [2,2], [2,2]]})

我想分组 int 给了我一大堆清单。然后我想将这些列表变平,所以我最终得到了一个具有以下形式的数据帧:

p = pd.DataFrame({"int" :  [1,                2],
                  "cod" : [[1,1,2,2,1,2,3,9], [2,2,2,2]]})

以下是迄今为止我所拥有的:

p.groupby("int", as_index=False)["cod"]

我一有空就不知道该怎么变平 int

1 回复 | 直到 6 年前

jezrael 6 年前

使用 sum :

df = p.groupby("int", as_index=False)["cod"].sum()

或 list comprehension :

df = p.groupby("int")["cod"].apply(lambda x: [z for y in x for z in y]).reset_index()

df = p.groupby("int")["cod"].apply(lambda x: np.concatenate(x.values).tolist()).reset_index()

对于性能,如果大列表应该是最快的:

from itertools import chain

df = p.groupby("int")["cod"].apply(lambda x: list(chain.from_iterable(x))).reset_index()

查看有关的详细信息 flattening lists .

print (df)
   int                       cod
0    1  [1, 1, 2, 2, 1, 2, 3, 9]
1    2              [2, 2, 2, 2]

推荐文章

July · 如何定义数字间隔,然后四舍五入

1 年前

Community wiki · 对象名称前的单下划线和双下划线的含义是什么?

1 年前

Brian Johnson · 为什么在Python中列出字典列表会引发TypeError?[已关闭]

1 年前

user026 · 如何根据特定窗口的平均值(行数)创建新列?

1 年前

Ashok Shrestha · 需要追踪特定的颜色线并获取坐标

1 年前

Nicote Ool · 在FastApi和Vue3中获得422

1 年前

NeoExceptCodeBad · 如果我有很多垂直线,我如何找到它们的边缘?

1 年前

Abdulaziz · 如何对集合内的列表进行排序[重复]

1 年前

user2743931 · 带有src目录的Python setup.py

1 年前

asmgx · 为什么合并数据帧不能按照python中的预期方式工作

1 年前