代码之家 › 专栏 › 技术社区 › eugene

熊猫,计算每组的值?

pandas python

eugene · 技术社区 · 4 年前

我试着从 df 到 df2
我是按 review_meta_id, age_bin ctr 从 sum(click_count)/ sum(impression_count)

In [69]: df
Out[69]:
   review_meta_id  age_month  impression_count  click_count age_bin
0               3          4                10            3       1
1               3         10                 5            2       2
2               3         20                 5            3       3
3               3          8                 9            2       2
4               4          9                 9            5       2

In [70]: df2
Out[70]:
   review_meta_id       ctr  age_bin
0               3  0.300000        1
1               3  0.285714        2
2               3  0.600000        3
3               4  0.555556        2



import pandas as pd

bins = [0, 5, 15, 30]
labels = [1,2,3]

l = [dict(review_meta_id=3, age_month=4, impression_count=10, click_count=3), dict(review_meta_id=3, age_month=10, impression_count=5, click_count=2), dict(review_meta_id=3, age_month=20, impression_count=5, cli\
ck_count=3), dict(review_meta_id=3, age_month=8, impression_count=9, click_count=2), dict(review_meta_id=4, age_month=9, impression_count=9, click_count=5)]

df = pd.DataFrame(l)
df['age_bin'] = pd.cut(df['age_month'], bins=bins, labels=labels)


grouped = df.groupby(['review_meta_id', 'age_bin'])

data = []
for name, group in grouped:
    ctr = group['click_count'].sum() / group['impression_count'].sum()
    review_meta_id, age_bin = name
    data.append(dict(review_meta_id=review_meta_id, ctr=ctr, age_bin=age_bin))


df2 = pd.DataFrame(data)

1 回复 | 直到 4 年前

jezrael 4 年前

sum ,然后用 DataFrame.pop MultiIndex 删除缺少值的行的步骤 DataFrame.dropna

df2 = df.groupby(['review_meta_id', 'age_bin'])[['click_count','impression_count']].sum()
df2['ctr'] = df2.pop('click_count') / df2.pop('impression_count')
df2 = df2.reset_index().dropna()
print (df2)
   review_meta_id age_bin       ctr
0               3       1  0.300000
1               3       2  0.285714
2               3       3  0.600000
4               4       2  0.555556

Azzedine 4 年前

你可以用 apply 将数据帧按 'review_meta_id', 'age_bin' 为了计算 'ctr' name='ctr' ,对应于系列值的列的名称。

def divide_two_cols(df_sub):
    return df_sub['click_count'].sum() / float(df_sub['impression_count'].sum())

df2 = df.groupby(['review_meta_id', 'age_bin']).apply(divide_two_cols).reset_index(name='ctr')
new_df

推荐文章

Google User · Django管理员在`list_display中未显示`creation_date`字段`

4 月前

user29747013 · 如何创建一个新的数据框架,其中包含原始数据框架中列的聚合列?

4 月前

ÎÎÎ½Î· ÎÎ®Î¹Î½Î¿Ï · Python lxml.html语法错误:使用lxml find时XPATH的谓词无效

4 月前

user29715306 · from_users=和chats=电视节目中的差异

4 月前

Redshoe · 当执行numpy.genfromtxt()时,python是否会读取文件的所有行?

4 月前

RASEL MAHMUD · 为什么以及如何在is_even()函数内的IF条件中递归X变量在满足0后递增?[副本]

4 月前

prayner · 更新嵌套字典包含列表中的项

4 月前

Bringo Jr · 我可以在O(n)中解决这个问题吗?

4 月前

Dave · 如何在for循环中修改列表值

4 月前

Shukurullox Komiljonov · 从记录中获得相互和解。使用SQL

4 月前