代码之家 › 专栏 › 技术社区 › Jane Sully

如何从分组聚合中去除Pandas中嵌套的列名?

columnname pandas-groupby aggregate-functions pandas python

Jane Sully · 技术社区 · 6 年前

Employee_id 和聚合 Customer_id .

Sales.groupby('Employee_id').agg({
    'Customer_id': [
        ('total_sales', 'count'),
        ('unique_sales', 'nunique')
]})

重要的是要知道,我也将执行与其他列的聚合,但到目前为止,这是我所写的全部内容。因此,如果你有一个建议的解决方案,我请你考虑一下,以防它产生影响。

虽然这正是我想要的,计算每个员工的总销售额和唯一销售额,并创建两列,但它创建了嵌套的列名。所以列名看起来像,[('Customer\u id','total\u sales'),('Customer\u id','unique\u sales')],这是我不想要的。有没有什么方法可以轻松地去掉嵌套部分,只包含['total\u sales'、'unique\u sales'],或者在完成所有操作后重命名列是最简单的方法?

谢谢!

1 回复 | 直到 6 年前

unutbu 6 年前

您只需重命名列:

import numpy as np
import pandas as pd
np.random.seed(2018)

df = pd.DataFrame(np.random.randint(10, size=(100, 3)), columns=['A','B','C'])
result = df.groupby('A').agg({'B': [('D','count'),('E','nunique')],
                              'C': [('F','first'),('G','max')]})
result.columns = result.columns.get_level_values(1)
print(result)

groupby 对象,并使用 grouped[col].agg(...) 生成子数据帧,然后 pd.concat “我们在一起:

import numpy as np
import pandas as pd
np.random.seed(2018)
df = pd.DataFrame(np.random.randint(10, size=(100, 3)), columns=['A','B','C'])
grouped = df.groupby('A')
result = pd.concat([grouped['B'].agg([('D','count'),('E','nunique')]),
                    grouped['C'].agg([('F','first'),('G','max')])], axis=1)
print(result)

两个代码段都产生以下结果(尽管列的顺序可能不同):

    D  E  F  G
A             
0  18  8  8  9
1  12  8  6  6
2  14  8  0  8
3  10  9  8  9
4   7  6  3  5
5   8  5  6  7
6   9  7  9  9
7   8  6  4  7
8   8  7  2  9
9   6  5  7  9

推荐文章

Joan · 基于多个panda列的唯一值进行分组

2 年前

d_frEak · 具有装箱条件的dataframe groupby聚合计数函数

2 年前

Andre Nevares sj95126 · 如何在Pandas中为特定键的唯一值添加新列(问题agregate)

2 年前

T_Ner · 如何筛选最后一行中的任何组是负数还是正数,只需显示该组即可。熊猫

2 年前

The Great · Pandas groupby并计算多列中NA值的比率

2 年前

yurnero · 熊猫groupby:当前组的坐标

2 年前

EugLP · Groupby multiple columns&Sum-使用添加的If条件创建新列

2 年前

R Shriya · 基于python中另一列中的AND条件在一列中获取值

2 年前

Anakin Skywalker · 修复列名并在将数据框按两列分组后重命名

2 年前

deppep · Pandas根据另一列的值创建一个包含索引的新列

2 年前