代码之家 › 专栏 › 技术社区 › Kristada673

如何创建新的数据帧来存储原始数据帧列的容器的平均值?

binning grouping dataframe pandas python

Kristada673 · 技术社区 · 6 年前

假设我有一个数据帧, df :

>>> df

Age    Score
19     1
20     2
24     3
19     2
24     3
24     1
24     3
20     1
19     1
20     3
22     2
22     1

我想构建一个新的数据框架 Age 并把他们的平均分储存在 Score :

Age       Score
19-21     1.6667
22-24     2.1667

这是我的做法,我觉得有点复杂:

import numpy as np
import pandas as pd

data = pd.DataFrame(columns=['Age', 'Score'])
data['Age'] = [19,20,24,19,24,24,24,20,19,20,22,22]
data['Score'] = [1,2,3,2,3,1,3,1,1,3,2,1]

_, bins = np.histogram(data['Age'], 2)

df1 = data[data['Age']<int(bins[1])]
df2 = data[data['Age']>int(bins[1])]

new_df = pd.DataFrame(columns=['Age', 'Score'])
new_df['Age'] = [str(int(bins[0]))+'-'+str(int(bins[1])), str(int(bins[1]))+'-'+str(int(bins[2]))]
new_df['Score'] = [np.mean(df1.Score), np.mean(df2.Score)]

除了很长,这种方法不能很好地扩展到更多的存储箱(因为我们需要在 new_df )

有没有一种更有效、更干净的方法来做到这一点?

1 回复 | 直到 6 年前

jezrael 6 年前

使用 cut 对于离散间隔的bin值,最后一个聚合 mean :

bins = [19, 21, 24]
#dynamically create labels
labels = ['{}-{}'.format(i + 1, j) for i, j in zip(bins[:-1], bins[1:])] 
labels[0] = '{}-{}'.format(bins[0], bins[1])
print (labels)
['19-21', '22-24']

binned = pd.cut(data['Age'], bins=bins, labels=labels, include_lowest=True)
df = data.groupby(binned)['Score'].mean().reset_index()
print (df)
     Age     Score
0  19-21  1.666667
1  22-24  2.166667

推荐文章

MadelineJC · group_by在R中按顺序排列数字

2 年前

John Doe · 循环遍历数组并分配新数组的键

7 年前

Pavel · 按键对XML元素分组

7 年前

RBrook · 如何利用C#(linq)生成困难的组?

7 年前

user · 使用LAG、FIRST\U值等构建组

7 年前

ReeceAPoole · 如何根据Python 3中的键值对连续的元素值进行分组?

7 年前

sinDizzy · 正则表达式将文本块与中间的关键短语匹配

7 年前

HT121 · 根据条件更新分组数据中的行并删除几行

7 年前

JY078 · python在列表中分组字符串

7 年前

artek · 基于公共密钥将多个json对象分组

7 年前