代码之家 › 专栏 › 技术社区 › EugLP

Groupby multiple columns&Sum-使用添加的If条件创建新列

pandas-groupby sum group-by pandas python

EugLP · 技术社区 · 2 年前

我需要按多个列分组&然后在添加了If条件的新列中求和。我尝试了下一个代码,它通过单列分组非常有效:

df['new column'] = (
    df['value'].where(df['value'] > 0).groupby(df['column1']).transform('sum')
)

然而,当我尝试按多个列进行分组时,我得到了一个错误。

df['new_column'] = (
        df['value'].where(df['value'] > 0).groupby(df['column1', 'column2']).transform('sum')
    )

错误:

->return self._engine.get_loc(casted_key) 
The above exception was the direct cause of the following exception: 
->indexer = self.columns.get_loc(key) 
->raise KeyError(key) from err 
->if is_scalar(key) and isna(key) and not self.hasnans: ('column1', 'column2')

您能告诉我应该如何更改代码以获得相同的结果,但按多个列分组吗?

非常感谢。

1 回复 | 直到 2 年前

Shubham Sharma mkln 2 年前

错误原因

选择多个列的语法 df['column1', 'column2'] 这是错误的。这应该是 df[['column1', 'column2']]
即使你使用 df[['column1','column2']] 对于 groupby ,熊猫会提出另一个错误,抱怨石斑鱼应该 one dimensional .这是因为 df[['column1','column2']] 返回作为二维对象的数据帧。

如何修复错误?

艰难之路:

将每个分组列作为一维序列传递给 子句

df['new_column'] = (
        df['value']
          .where(df['value'] > 0)
          .groupby([df['column1'], df['column2']]) # Notice the change
          .transform('sum')
)

简单方法:

首先将屏蔽列的值指定给目标列,然后执行以下操作: 子句 + transform 就像你通常做的那样

df['new_column'] = df['value'].where(df['value'] > 0)
df['new_column'] = df.groupby(['column1', 'column2'])['new_column'].transform('sum')

推荐文章

July · 如何定义数字间隔,然后四舍五入

1 年前

Community wiki · 对象名称前的单下划线和双下划线的含义是什么?

1 年前

Brian Johnson · 为什么在Python中列出字典列表会引发TypeError?[已关闭]

1 年前

user026 · 如何根据特定窗口的平均值(行数)创建新列?

1 年前

Ashok Shrestha · 需要追踪特定的颜色线并获取坐标

1 年前

Nicote Ool · 在FastApi和Vue3中获得422

1 年前

NeoExceptCodeBad · 如果我有很多垂直线,我如何找到它们的边缘?

1 年前

Abdulaziz · 如何对集合内的列表进行排序[重复]

1 年前

user2743931 · 带有src目录的Python setup.py

1 年前

asmgx · 为什么合并数据帧不能按照python中的预期方式工作

1 年前