代码对我来说是按预期工作的。但是,您可以使用
mode
以使其更易于阅读。此外,您还可以转换groupby中的函数以直接分配给列,这将使您的整个操作变成一行代码。
df['standardized_label'] = df.groupby('ID')['raw_label'].transform(lambda x: x.mode()[0])
或者您可以使用
groupby.apply
并绘制地图。无论如何,函数看起来应该是:
def standardize_labels(df, id_col, label_col):
# Function to find the most common label or the first one if there's a tie
def most_common_label(group):
return group.mode()[0]
# Group by the ID column and apply the most_common_label function
common_labels = df.groupby(id_col)[label_col].apply(most_common_label)
# Map the IDs in the original DataFrame to their common labels
df['standardized_label'] = df[id_col].map(common_labels)
return df
自从
value_counts()
在数据帧上工作,我们可以在没有groupby的情况下直接使用它。因此,函数可以更改为以下内容。这是的重构
a function
我写了一个不同的问题。
def standardize_labels(df, id_col, label_col):
# Group by the ID column and apply the most_common_label function
labels_counts = df.value_counts([id_col, label_col])
dup_idx_msk = ~labels_counts.droplevel(label_col).index.duplicated()
common_labels = labels_counts[dup_idx_msk]
common_labels = common_labels.reset_index(level=1)[label_col]
# Map the IDs in the original DataFrame to their common labels
df['standardized_label'] = df[id_col].map(common_labels)
return df
df = standardize_labels(df, 'ID', 'raw_label')