代码之家 › 专栏 › 技术社区 › qshng

如何在df.apply()中传递*参数

partial pandas python

qshng · 技术社区 · 6 年前

我有一个函数,我希望它能够应用于基于输入的可变列数。

def split_and_combine(row, *args, delimiter=';'):
    combined = []
    for a in args:
        if not row[a]:
            combined.extend(row[a].split(delimiter))

    combined = list(set(combined))
    return combined

但是我不知道如何将这个函数应用于df,因为有*个参数。我不太熟悉 *args 和 *kwargs 在巨蟒中。我试着使用偏轴,并将轴设为1,如下所示,但得到了下面的类型错误。

df['combined'] = df.apply(partial(split_and_combine, ['col1','col2']),
                          axis=1)

TypeError: ('list indices must be integers or slices, not Series', 'occurred at index 0')

上面代码的一个虚拟示例。我希望能够传递灵活的列数以进行组合:

Index   col1        col2            combined
0      John;Mary    Sam;Bill;Eva    John;Mary;Sam;Bill;Eva
1      a;b;c        a;d;f           a;b;c;d;f

谢谢!如果在没有df.apply的情况下有更好的方法。请随时发表评论!

1 回复 | 直到 6 年前

Asish M. 6 年前

df.apply

df.apply(split_and_combine, args=('col1', 'col2'), axis=1)

def split_and_combine(row, *args, delimiter=';'):
    combined = []
    for a in args:
        if row[a]:
            combined.extend(row[a].split(delimiter))
    combined = list(set(combined))
    return delimiter.join(combined)

推荐文章

July · 如何定义数字间隔,然后四舍五入

1 年前

Community wiki · 对象名称前的单下划线和双下划线的含义是什么?

1 年前