代码之家  ›  专栏  ›  技术社区  ›  jovicbg

如果字符串包含来自大熊猫中其他列的值,则映射列

  •  0
  • jovicbg  · 技术社区  · 6 年前

    我有两个数据帧,第一个是:

    DF1

    col_one col_two
    ABBC1  (1, 2, 3)
    DFFG2  (3, 5, 1)
    JJKS3  (5, 2, 5)
    

    DF2

        col_1
    operate ABBC1 1 to 2, JJKS3 3 to 5
    operate JJKS3, FOM
    

    期望输出DF2:

      col_1                col_2
    operate ABBC1, to 2  (1, 2, 3)
    operate JJKS3, FOM   (5, 2, 5)
    

    我尝试过多种方法,但最接近的是:

    for values, map_col in df1[['col_one', 'col_two']].values:
        for val in df2['col_1']:
            if ("%s" %values) in df2['col_1'] :
                df2['col_2'] = "%s" %(map_col,)
    

    我认为这会很好,但所有行的值都完全相同。

    欢迎任何帮助。谢谢

    1 回复  |  直到 6 年前
        1
  •  1
  •   jezrael    6 年前

    extract

    pat = '|'.join(r"\b{}\b".format(x) for x in df1['col_one'].unique())
    df2['col_one'] = df2['col_1'].str.extract('(' + pat + ')')
    print (df2)
                     col_1 col_one
    0  operate ABBC1, to 2   ABBC1
    1   operate JJKS3, FOM   JJKS3
    
    df = df1.merge(df2, on='col_one')
    print (df)
      col_one    col_two                col_1
    0   ABBC1  (1, 2, 3)  operate ABBC1, to 2
    1   JJKS3  (5, 2, 5)   operate JJKS3, FOM
    

    如果可以匹配多个值,请使用 findall DataFrame 以下内容:

    pat = '|'.join(r"\b{}\b".format(x) for x in df1['col_one'].unique())
    s = df2['col_1'].str.findall('(' + pat + ')')
    print (s)
    0    [ABBC1, JJKS3]
    1           [JJKS3]
    Name: col_1, dtype: object
    
    lens = s.str.len()
    a = np.repeat(df2['col_1'], lens)
    b = np.concatenate(s)
    df2 = pd.DataFrame({'col_1':a, 'col_one':b})
    print (df2)
                                   col_1 col_one
    0  operate ABBC1, to 2  JJKS3 3 to 5   ABBC1
    0  operate ABBC1, to 2  JJKS3 3 to 5   JJKS3
    1                 operate JJKS3, FOM   JJKS3
    
    df = df1.merge(df2, on='col_one')
    print (df)
      col_one    col_two                              col_1
    0   ABBC1  (1, 2, 3)  operate ABBC1, to 2  JJKS3 3 to 5
    1   JJKS3  (5, 2, 5)  operate ABBC1, to 2  JJKS3 3 to 5
    2   JJKS3  (5, 2, 5)                 operate JJKS3, FOM