代码之家 › 专栏 › 技术社区 › daiyue

在每个组的另一列上基于多个条件创建布尔列

pandas-groupby dataframe pandas python-3.x

daiyue · 技术社区 · 6 年前

我有以下几点 df

cluster_id   inv_id    
1            A1
1            A1
2            A1111A
2            A1111A

我想 groupby cluster_id 并创建一个名为 invalid_inv_id inv_id

1. in each cluster, if the length of inv_id (stripped of non numerics) < 100 set "invalid_inv_id" to true;

2. in each cluster, if the length of inv_id is < 3 set "invalid_inv_id" to true.

df['inv_id_stp'] = df.inv_id.str.replace(r'\D+', '')

grouped = df.groupby('cluster_id')

invoices['invalid_inv_id'] = grouped['inv_id_stp'].transform(lambda x: x.str.len()) < 100

invoices['invalid_inv_id'] = grouped['inv_id'].transform(lambda x: x.str.len()) < 3

cluster_id    inv_id    invalid_inv_id
1             A1         True
1             A1         True
2             A1111A     True
2             A1111A     True

1 回复 | 直到 6 年前

BENY 6 年前

IIUC公司, groupby 这里不需要

(df.inv_id.str.len()<3)|(df.inv_id.str.replace(r'\D+', '').str.len()<100)
Out[472]: 
0    True
1    True
2    True
3    True
Name: inv_id, dtype: bool

any

((df.inv_id.str.len()<3)|(df.inv_id.str.replace(r'\D+', '').str.len()<100)).groupby(df['cluster_id']).transform('any')

推荐文章

Mainland · Python数据帧规范化值错误:列的长度必须与键相同

1 年前

user026 · 如何根据特定窗口的平均值(行数)创建新列?

1 年前

rpn · 如何在列[1]中连续第二次出现“0”时返回列[0]的值

1 年前

asmgx · 为什么合并数据帧不能按照python中的预期方式工作

1 年前

Gtoth · 如何分割Pandas DataFrame中包含多个日期的两个时间戳之间的差异

1 年前

Domarius · 使用loc为多行设置多列值

1 年前

Swastik Bhattacharyya · 如何在同一类别类型的多列上运行get_dummies()函数?

1 年前

DrZoidberg09 · 如何在字典列表中创建一个新关键字,该关键字是另一个关键字的总和?

1 年前

armstrong3701 · 如何有效地处理熊猫数据框中缺失的数据并计算条件统计?

1 年前

msts1906 · 大熊猫向乳胶的适当多品种出口

1 年前