代码之家  ›  专栏  ›  技术社区  ›  artemis Roberto

连接两个pandas数据帧失败

  •  1
  • artemis Roberto  · 技术社区  · 5 年前

    我有两个熊猫数据框,看起来像这样:

    df_out :

    Prediction       count_human  count_bot  %_bot_tweets
    username
    666STEVEROGERS             8        131      0.942446
    ADELE_BROCK                0        126      1.000000
    ADRIANAMFTTT              99          0      0.000000
    AHMADRADJAB                0        108      1.000000
    ALBERTA_HAYNESS          101          0      0.000000
    ALTMANBELINDA              0        139      1.000000
    ALVA_MC_GHEE              29        104      0.781955
    ANGELITHSS                 0        113      1.000000
    ANN1EMCCONNELL             0        125      1.000000
    ANWARJAMIL22               0        112      1.000000
    AN_N_GASTON                0        107      1.000000
    ARONHOLDEN8               89         31      0.258333
    ARTHCLAUDIA                0        103      1.000000
    ASSUNCAOWALLAS             0        108      1.000000
    BECCYWILL                  0        132      1.000000
    BELOZEROVNIKIT           132          8      0.057143
    BEN_SAR_GENT              24         84      0.777778
    BERT_HENLEY              105          0      0.000000
    BISHOLORINE                0        117      1.000000
    BLACKERTHEBERR5            4        100      0.961538
    BLACKTIVISTSUS            49         68      0.581197
    BLACK_ELEVATION           32         74      0.698113
    BOGDANOVAO2                0        127      1.000000
    BREMENBOTE                70         39      0.357798
    B_stever96                 0        171      1.000000
    CALIFRONIAREP             60         72      0.545455
    C_dos_94                   0        121      1.000000
    Cassidygirly               0        153      1.000000
    ChuckSpeaks_               0        185      1.000000
    Cyabooty                 111          0      0.000000
    DurkinSays                 0        131      1.000000
    LSU_studyabroad          117          0      0.000000
    MisMonWEXP               131          0      0.000000
    NextLevel_Mel              0        185      1.000000
    PeterDuca                108          0      0.000000
    ShellMarcel                0         97      1.000000
    Sir_Fried_Alott            0        144      1.000000
    XavierRivera_            197          0      0.000000
    ZacharyFlair             213          0      0.000000
    brentvarney44              0        126      1.000000
    cbars68                  225          0      0.000000
    chloeschultz11             0        106      1.000000
    hoang_le_96                0        104      1.000000
    kdougherty178              0        127      1.000000
    lasallephilo             138          0      0.000000
    lovely_cunt_               0        137      1.000000
    megliebsch                 0        217      1.000000
    msimps_15                138          0      0.000000
    okweightlossdna          105          0      0.000000
    tankthe_hank             231          0      0.000000
    

    以及 knn_res :

          following  followers        username  Prediction  is_bot
    0           199         77      megliebsch           1       0
    1           199         77      megliebsch           1       0
    2           199         77      megliebsch           1       0
    3           199         77      megliebsch           1       0
    4           199         77      megliebsch           1       0
    ...         ...        ...             ...         ...     ...
    6643         67         57  ASSUNCAOWALLAS           1       1
    6644         67         57  ASSUNCAOWALLAS           1       1
    6645         67         57  ASSUNCAOWALLAS           1       1
    6646         67         57  ASSUNCAOWALLAS           1       1
    6647         67         57  ASSUNCAOWALLAS           1       1
    

    我要做的是 username 在里面 离开 ,左连接到 克努雷斯 为了得到 following followers 价值观。

    在SQL中,我可以使用: SELECT a.*, b.following, b.followers FROM df_out a LEFT JOIN knn_res b ON a.username = b.username

    我试过:

    test_df = df_out
    test_df.set_index('username').join(knn_res.set_index('username'), on='username', how='left')
    print(test_df)
    

    结果是:

      File "C:\Python367-64\lib\site-packages\pandas\core\frame.py", line 4396, in set_index
        raise KeyError("None of {} are in the columns".format(missing))
    KeyError: "None of ['username'] are in the columns"
    

    我做错什么了?我试着引用 this documentation for the problem .

    更新

    我也试过了 inner join ,得到了完全相同的结果:

    文件“C:\ Python367-64\lib\site packages\pandas\core\frame.py”,第4396行,在集合索引中
    raise KeyError(“列中没有{})。格式(缺少)
    KeyError:“['username']都不在列中”
    

    离开 创建时使用:

    df_out = (knn_res.groupby(['username', 'Prediction']).is_bot.count().unstack(fill_value=0).
                 rename({0: 'count_human', 1: 'count_bot'}, axis= 1))
    
    df_out['%_bot_tweets'] = df_out['count_bot'] / (df_out['count_bot'] + df_out['count_human'])
    
    1 回复  |  直到 5 年前
        1
  •  1
  •   Andy L.    5 年前

    试试这个。违约 join 选项是 left ,所以不需要指定它。两个数据帧都有 username 作为索引和 参加 在索引上工作,因此也不需要指定 on 选择。最后,您只想连接列 following followers ,所以在设定之后 用户名 作为索引,只需对这两列进行切片以进行连接。( 注意:您应该使用 copy() 当您要将原始数据帧复制到 test_df 因为没有 复制() ,都指向同一数据帧对象 )

    test_df = df_out.copy()
    test_df = test_df.join(knn_res.set_index('username')[['following', 'followers']])
    print(test_df)
    
    Out[93]:
                     count_human  count_bot  %_bot_tweets  following  followers
    username
    666STEVEROGERS             8        131      0.942446        NaN        NaN
    ADELE_BROCK                0        126      1.000000        NaN        NaN
    ADRIANAMFTTT              99          0      0.000000        NaN        NaN
    AHMADRADJAB                0        108      1.000000        NaN        NaN
    ALBERTA_HAYNESS          101          0      0.000000        NaN        NaN
    ALTMANBELINDA              0        139      1.000000        NaN        NaN
    ALVA_MC_GHEE              29        104      0.781955        NaN        NaN
    ANGELITHSS                 0        113      1.000000        NaN        NaN
    ANN1EMCCONNELL             0        125      1.000000        NaN        NaN
    ANWARJAMIL22               0        112      1.000000        NaN        NaN
    AN_N_GASTON                0        107      1.000000        NaN        NaN
    ARONHOLDEN8               89         31      0.258333        NaN        NaN
    ARTHCLAUDIA                0        103      1.000000        NaN        NaN
    ASSUNCAOWALLAS             0        108      1.000000       67.0       57.0
    ASSUNCAOWALLAS             0        108      1.000000       67.0       57.0
    ASSUNCAOWALLAS             0        108      1.000000       67.0       57.0
    ASSUNCAOWALLAS             0        108      1.000000       67.0       57.0
    ASSUNCAOWALLAS             0        108      1.000000       67.0       57.0
    BECCYWILL                  0        132      1.000000        NaN        NaN
    BELOZEROVNIKIT           132          8      0.057143        NaN        NaN
    BEN_SAR_GENT              24         84      0.777778        NaN        NaN
    BERT_HENLEY              105          0      0.000000        NaN        NaN
    BISHOLORINE                0        117      1.000000        NaN        NaN
    BLACKERTHEBERR5            4        100      0.961538        NaN        NaN
    BLACKTIVISTSUS            49         68      0.581197        NaN        NaN
    BLACK_ELEVATION           32         74      0.698113        NaN        NaN
    BOGDANOVAO2                0        127      1.000000        NaN        NaN
    BREMENBOTE                70         39      0.357798        NaN        NaN
    B_stever96                 0        171      1.000000        NaN        NaN
    CALIFRONIAREP             60         72      0.545455        NaN        NaN
    C_dos_94                   0        121      1.000000        NaN        NaN
    Cassidygirly               0        153      1.000000        NaN        NaN
    ChuckSpeaks_               0        185      1.000000        NaN        NaN
    Cyabooty                 111          0      0.000000        NaN        NaN
    DurkinSays                 0        131      1.000000        NaN        NaN
    LSU_studyabroad          117          0      0.000000        NaN        NaN
    MisMonWEXP               131          0      0.000000        NaN        NaN
    NextLevel_Mel              0        185      1.000000        NaN        NaN
    PeterDuca                108          0      0.000000        NaN        NaN
    ShellMarcel                0         97      1.000000        NaN        NaN
    Sir_Fried_Alott            0        144      1.000000        NaN        NaN
    XavierRivera_            197          0      0.000000        NaN        NaN
    ZacharyFlair             213          0      0.000000        NaN        NaN
    brentvarney44              0        126      1.000000        NaN        NaN
    cbars68                  225          0      0.000000        NaN        NaN
    chloeschultz11             0        106      1.000000        NaN        NaN
    hoang_le_96                0        104      1.000000        NaN        NaN
    kdougherty178              0        127      1.000000        NaN        NaN
    lasallephilo             138          0      0.000000        NaN        NaN
    lovely_cunt_               0        137      1.000000        NaN        NaN
    megliebsch                 0        217      1.000000      199.0       77.0
    megliebsch                 0        217      1.000000      199.0       77.0
    megliebsch                 0        217      1.000000      199.0       77.0
    megliebsch                 0        217      1.000000      199.0       77.0
    megliebsch                 0        217      1.000000      199.0       77.0
    msimps_15                138          0      0.000000        NaN        NaN
    okweightlossdna          105          0      0.000000        NaN        NaN
    tankthe_hank             231          0      0.000000        NaN        NaN