我有以下几点
df
cluster_id inv_id
1 A1
1 A1
2 A1111A
2 A1111A
我想
groupby
cluster_id
并创建一个名为
invalid_inv_id
inv_id
1. in each cluster, if the length of inv_id (stripped of non numerics) < 100 set "invalid_inv_id" to true;
2. in each cluster, if the length of inv_id is < 3 set "invalid_inv_id" to true.
df['inv_id_stp'] = df.inv_id.str.replace(r'\D+', '')
grouped = df.groupby('cluster_id')
invoices['invalid_inv_id'] = grouped['inv_id_stp'].transform(lambda x: x.str.len()) < 100
invoices['invalid_inv_id'] = grouped['inv_id'].transform(lambda x: x.str.len()) < 3
cluster_id inv_id invalid_inv_id
1 A1 True
1 A1 True
2 A1111A True
2 A1111A True