我有两个数据帧,我在其中找到基于列(TLD)的公共匹配,如果找到匹配(在源和目标中的列之间),我将列(UUID)的值从源复制到目标数据帧。
现在我还需要检查不同的列是否匹配。(公司名称)
数据帧1:源
uuid website company_name tld
0 11 www.facebook.com facebook facebook.com
1 22 www.yahoo.com yahoo inc yahoo.com
2 33 www.google.com Google google.com
3 44 www.cisco.com Cisco cisco.com
数据帧2:目的地
id website company_name tld match uuid
0 a www.facebook.com facebook facebook.com False NaN
1 b www.y.com Yahoo Inc y.com False NaN
2 c www.g.com Google g.com False NaN
3 d www.g.com Google Inc g.com False NaN
4 e www.facebook.com Facebook Inc facebook.com False NaN
查找
matches
以下内容:
destination.loc[destination.tld.isin(source.tld),'match'] = True
destination = destination.merge(source[['tld', 'uuid']], on='tld', how='left')
上面将uuid列从源复制到uuid列中
在目标数据帧中。
id website company_name tld match uuid
0 a www.facebook.com facebook facebook.com True 11
1 b www.y.com YahooInc y.com False NaN
2 c www.g.com Google g.com False NaN
3 d www.g.com GoogleInc g.com False NaN
4 e www.facebook.com FacebookInc facebook.com True 11
现在,我需要检查公司名称是否匹配,以及是否具有以下内容:
id website company_name tld match uuid
0 a www.facebook.com facebook facebook.com True 11
1 b www.y.com YahooInc y.com False NaN
2 c www.g.com Google g.com True 33
3 d www.g.com GoogleInc g.com False NaN
4 e www.facebook.com FacebookInc facebook.com True 11
当我试图添加:
destination.loc[destination.company_name.isin(source.company_name), 'match'] = True
destination = destination.merge(source[['company_name', 'uuid']], on='company_name', how='left')
我得到一个重复的uuid列:uuid和uuid
id website company_name tld match uuid_x uuid_y
0 a www.facebook.com facebook facebook.com True 11 11
1 b www.y.com Yahoo Inc y.com False NaN NaN
2 c www.g.com Google g.com True NaN 33
3 d www.g.com Google Inc g.com False NaN NaN
4 e www.facebook.com Facebook Inc facebook.com True 11 NaN
最终代码
destination.loc[destination.tld.isin(source.tld),'match'] = True
destination = destination.merge(source[['tld', 'uuid']], on='tld', how='left')
destination.loc[destination.company_name.isin(source.company_name), 'match'] = True
destination = destination.merge(source[['company_name', 'uuid']], on='company_name', how='left')