代码之家  ›  专栏  ›  技术社区  ›  edyvedy13

在Pandas Datagrame中消除列中的重复字符串

  •  1
  • edyvedy13  · 技术社区  · 4 年前

    我有这样的数据帧:

    item     tags
    1        awesome, awesome, great
    2        cool, fun
    3        boring, boring, average
    4        ok, expensive
    

    如何删除重复的标记以获取:

    item     tags
    1        awesome, great
    2        cool, fun
    3        boring, average
    4        ok, expensive
    
    2 回复  |  直到 4 年前
        1
  •  0
  •   Seananigan Emma    4 年前

    如果我理解正确,请尝试:

    df['new_tags'] = df['tags'].apply(lambda x: ', '.join(set(x.split(', '))))
    

    输出:

       item                     tags         new_tags
    0     1  awesome, awesome, great   awesome, great
    1     2                cool, fun        cool, fun
    2     3  boring, boring, average  average, boring
    3     4            ok, expensive    expensive, ok
    
        2
  •  1
  •   Andy L.    4 年前

    使用listcomp, str.split , pd.unique join

    df['unique_tags'] = [', '.join(pd.unique(x)) for x in df.tags.str.split(', ')]
    
    Out[145]:
       item                     tags      unique_tags
    0     1  awesome, awesome, great   awesome, great
    1     2                cool, fun        cool, fun
    2     3  boring, boring, average  boring, average
    3     4            ok, expensive    ok, expensive