我认为有重复的:
col = ['first', 'second', 'third']
a = pd.DataFrame.from_records([('a','b',True), ('a','c',True), ('a','c', True)], columns=col)
b = pd.DataFrame.from_records([('a','b',True), ('a','c',True)], columns=col)
c = pd.merge(a,b,how='outer',left_on=['first','second'],right_on=['first', 'second'])
print (a)
first second third
0 a b True
1 a c True <-duplicates a,c
2 a c True <-duplicates a,c
print (b)
first second third
0 a b True
1 a c True
print (c)
first second third_x third_y
0 a b True True
1 a c True True
2 a c True True
您可以找到重复项:
print (a[a.duplicated(['first','second'], keep=False)])
first second third
1 a c True
2 a c True
print (b[b.duplicated(['first','second'], keep=False)])
Empty DataFrame
Columns: [first, second, third]
Index: []
解决方案是删除重复项
drop_duplicates
以下内容:
a = a.drop_duplicates(['first','second'])
b = b.drop_duplicates(['first','second'])
c = pd.merge(a,b,how='outer',left_on=['first','second'],right_on=['first', 'second'])
print (a)
first second third
0 a b True
1 a c True
print (b)
first second third
0 a b True
1 a c True
print (c)
first second third_x third_y
0 a b True True
1 a c True True