代码之家  ›  专栏  ›  技术社区  ›  Renee

数据帧的两列和三列的唯一列表的并集

  •  2
  • Renee  · 技术社区  · 2 年前

    如何从数据帧的两列和三列中获得唯一列表的并集?

    这是我正在使用的数据帧:

    Col1 Extract              Col2 Extract           Col3 Extract      
    ------------              ------------           ------------
    ['unclassified']          ['sink', 'fridge']     ['unclassified']
    ['fridge', 'microwave']   ['fridge', 'stove']    ['sink']          
    ['unclassified']          ['unclassified']       ['unclassified']
    

    我想要的是使用熊猫方式的(‘Col1 Extract’+‘Col2 Extract’)和(‘Coll Extract’+‘Col2 Extract’+‘Col3 Extract’’)的唯一列表的联合。这就是我想要的:

    Col1+Col2                             Col1+Col2+Col3
    ------------                          ---------------             
    ['unclassified', 'sink', 'fridge']    ['unclassified', 'sink', 'fridge']      
    ['fridge', 'microwave', 'stove']      ['fridge', 'microwave', 'stove', 'sink']          
    ['unclassified']                      ['unclassified']  
    
    1 回复  |  直到 2 年前
        1
  •  1
  •   jezrael    2 年前

    联接列并删除重复项的方式 set s

    df['Col1+Col2'] = (df['Col1 Extract'] + df['Col2 Extract']).apply(lambda x: list(set(x)))
    df['Col1+Col2+Col3'] = (df['Col1 Extract'] + df['Col2 Extract'] + df['Col3 Extract']).apply(lambda x: list(set(x)))
    print (df)
              Col1 Extract     Col2 Extract    Col3 Extract  \
    0       [unclassified]   [sink, fridge]  [unclassified]   
    1  [fridge, microwave]  [fridge, stove]          [sink]   
    2       [unclassified]   [unclassified]  [unclassified]   
    
                          Col1+Col2                    Col1+Col2+Col3  
    0  [fridge, unclassified, sink]      [fridge, unclassified, sink]  
    1    [stove, fridge, microwave]  [stove, fridge, microwave, sink]  
    2                [unclassified]                    [unclassified] 
    

    如果订购是重要用途 dict.fromkeys 戏法

    df['Col1+Col2'] = (df['Col1 Extract'] + df['Col2 Extract']).apply(lambda x: list(dict.fromkeys(x)))
    df['Col1+Col2+Col3'] = (df['Col1 Extract'] + df['Col2 Extract'] + df['Col3 Extract']).apply(lambda x: list(dict.fromkeys(x)))
    print (df)
              Col1 Extract     Col2 Extract    Col3 Extract  \
    0       [unclassified]   [sink, fridge]  [unclassified]   
    1  [fridge, microwave]  [fridge, stove]          [sink]   
    2       [unclassified]   [unclassified]  [unclassified]   
    
                          Col1+Col2                    Col1+Col2+Col3  
    0  [unclassified, sink, fridge]      [unclassified, sink, fridge]  
    1    [fridge, microwave, stove]  [fridge, microwave, stove, sink]  
    2                [unclassified]                    [unclassified]