我有两个熊猫数据框,看起来像这样:
df_out
:
Prediction count_human count_bot %_bot_tweets
username
666STEVEROGERS 8 131 0.942446
ADELE_BROCK 0 126 1.000000
ADRIANAMFTTT 99 0 0.000000
AHMADRADJAB 0 108 1.000000
ALBERTA_HAYNESS 101 0 0.000000
ALTMANBELINDA 0 139 1.000000
ALVA_MC_GHEE 29 104 0.781955
ANGELITHSS 0 113 1.000000
ANN1EMCCONNELL 0 125 1.000000
ANWARJAMIL22 0 112 1.000000
AN_N_GASTON 0 107 1.000000
ARONHOLDEN8 89 31 0.258333
ARTHCLAUDIA 0 103 1.000000
ASSUNCAOWALLAS 0 108 1.000000
BECCYWILL 0 132 1.000000
BELOZEROVNIKIT 132 8 0.057143
BEN_SAR_GENT 24 84 0.777778
BERT_HENLEY 105 0 0.000000
BISHOLORINE 0 117 1.000000
BLACKERTHEBERR5 4 100 0.961538
BLACKTIVISTSUS 49 68 0.581197
BLACK_ELEVATION 32 74 0.698113
BOGDANOVAO2 0 127 1.000000
BREMENBOTE 70 39 0.357798
B_stever96 0 171 1.000000
CALIFRONIAREP 60 72 0.545455
C_dos_94 0 121 1.000000
Cassidygirly 0 153 1.000000
ChuckSpeaks_ 0 185 1.000000
Cyabooty 111 0 0.000000
DurkinSays 0 131 1.000000
LSU_studyabroad 117 0 0.000000
MisMonWEXP 131 0 0.000000
NextLevel_Mel 0 185 1.000000
PeterDuca 108 0 0.000000
ShellMarcel 0 97 1.000000
Sir_Fried_Alott 0 144 1.000000
XavierRivera_ 197 0 0.000000
ZacharyFlair 213 0 0.000000
brentvarney44 0 126 1.000000
cbars68 225 0 0.000000
chloeschultz11 0 106 1.000000
hoang_le_96 0 104 1.000000
kdougherty178 0 127 1.000000
lasallephilo 138 0 0.000000
lovely_cunt_ 0 137 1.000000
megliebsch 0 217 1.000000
msimps_15 138 0 0.000000
okweightlossdna 105 0 0.000000
tankthe_hank 231 0 0.000000
以及
knn_res
:
following followers username Prediction is_bot
0 199 77 megliebsch 1 0
1 199 77 megliebsch 1 0
2 199 77 megliebsch 1 0
3 199 77 megliebsch 1 0
4 199 77 megliebsch 1 0
... ... ... ... ... ...
6643 67 57 ASSUNCAOWALLAS 1 1
6644 67 57 ASSUNCAOWALLAS 1 1
6645 67 57 ASSUNCAOWALLAS 1 1
6646 67 57 ASSUNCAOWALLAS 1 1
6647 67 57 ASSUNCAOWALLAS 1 1
我要做的是
username
在里面
离开
,左连接到
克努雷斯
为了得到
following
和
followers
价值观。
在SQL中,我可以使用:
SELECT a.*, b.following, b.followers FROM df_out a LEFT JOIN knn_res b ON a.username = b.username
我试过:
test_df = df_out
test_df.set_index('username').join(knn_res.set_index('username'), on='username', how='left')
print(test_df)
结果是:
File "C:\Python367-64\lib\site-packages\pandas\core\frame.py", line 4396, in set_index
raise KeyError("None of {} are in the columns".format(missing))
KeyError: "None of ['username'] are in the columns"
我做错什么了?我试着引用
this documentation for the problem
.
更新
我也试过了
inner join
,得到了完全相同的结果:
文件“C:\ Python367-64\lib\site packages\pandas\core\frame.py”,第4396行,在集合索引中
raise KeyError(“列中没有{})。格式(缺少)
KeyError:“['username']都不在列中”
离开
创建时使用:
df_out = (knn_res.groupby(['username', 'Prediction']).is_bot.count().unstack(fill_value=0).
rename({0: 'count_human', 1: 'count_bot'}, axis= 1))
df_out['%_bot_tweets'] = df_out['count_bot'] / (df_out['count_bot'] + df_out['count_human'])