代码之家  ›  专栏  ›  技术社区  ›  mobcdi

选择dataframe中不在具有不同形状数组的不同dataframe中且对象不可iterable的行

  •  0
  • mobcdi  · 技术社区  · 6 年前

    NotProcessed= FileListDF.loc[FileListDF['DocID'] != InterimDF.DocID.tolist()] 我明白了

    值错误:数组的长度不同:255对246

    这个 question 似乎遇到了类似的问题,但答案并不清楚,当我尝试时,我得到了

    “UUID”对象不可iterable

    uuid.uuid4() 在第一个实例中创建DocID

    FileListDF..to_json
    
    <bound method NDFrame.to_json of     Class                                 DocID
    0      CC  c2571c68-0705-4c87-a42b-0971a73fedf4
    1      CC  ab317908-8720-47ce-a8f6-453169259e48
    2      CC  2b3d1aee-b4f6-4278-9db2-c7b6819bd14f
    3      CC  9d4b9084-f623-445c-86d9-0458d6fb4135
    4      CC  afe9dbb5-38a2-4c4e-9640-b1da1d37b66f
    5      CC  56fa7edb-5020-44e7-98bc-5b0878fbf9ac
    6      CC  7c27bcb8-b381-465e-a6e9-edc38cf24e5a
    7      CC  a56e6f13-bf05-474a-9930-5f0ebad33eba
    8      CC  81c31d76-cec3-4f1c-8ec1-107ecc1169d4
    9      CC  1f8acf61-0a08-4b42-8ad1-38eeebc85ff2
    10     CC  d6a6704c-21c6-4dc5-b9bd-de9e67ee7970
    11     CC  d63dde24-9a2b-410d-aa7d-6a854b646a12
    12     CC  a80367f9-36af-45bd-878e-d19461a3d3df
    13     CC  31f74aff-e8df-43bb-a7d4-314caea6de20
    14     CC  99052ee4-2481-4c81-92dd-81b446e76c15
    15     CC  3827d0b4-5d0a-490a-a561-bf631bae6492
    16     CC  fdbde364-776d-4532-8aca-874144a85df8
    17     CC  4dfa630f-20b0-4a9e-bcf0-cdd4f18bb215
    18     CC  f5b16b14-a058-4f2d-8461-25e68b02c68b
    19     CC  a9f23eee-5b44-4d4e-baf8-19b78b4e3508
    20     CC  344bfb49-3b22-4e2a-b815-a9eb46cbed1b
    21     CC  ade5b2c3-5962-4d6d-9252-4f670b127682
    22     CC  43fa1fb2-3681-4372-ab5a-f12059b3741c
    23     CC  77f19056-fef8-4353-9740-01afa5557f47
    24     CC  1c7cfc8a-57c6-4ad3-bce7-63b006352ed1
    25     CC  e6db5e3c-4b3e-4b27-9558-e7334032c9c0
    26     CC  68adb3b6-dfeb-4645-8ed2-610ae896e8c4
    27     CC  c0163ee2-9e34-4636-a734-051151c50f29
    28     CC  7b7d8549-8697-4b19-a033-e0068f5aa4d7
    29     CC  7a976925-57c5-4972-a9e6-278dd322f6b4
    ..    ...                                   ...
    225    JA  cd3a901e-a077-472e-a100-f082f011cc72
    226    JA  d462729a-e1bc-4502-abc1-8dbd2b21103d
    227    JA  f5338d8d-13dc-4a03-b720-1056419deb68
    228    JA  37a41e22-3698-459d-b3cd-0d419a0f3f48
    229    JA  e48c4463-565d-4f09-9ccc-03e0506ddf1e
    230    JA  8531a863-4b0b-4bfa-9c4a-05056e7c1165
    231    JA  01e72b43-730d-42a7-9405-6fd45b2141b5
    232    JA  5ebbf1d4-48f1-4d1f-a5bc-1e865c92146f
    233    JA  c3103348-2209-42de-8005-06a9af3ca43f
    234    JA  b603b2cd-c4c9-4a2e-a52f-361a1e06b277
    235    JA  32bf7eba-7425-4f9c-ad1f-95680c8ad7cc
    236    JA  514f667e-5ba8-40fa-8c6f-bbf06bb6ff2f
    237    JA  d08b06d1-b8fc-432b-94af-a25d9ef8b4fe
    238    JA  d32419e8-e327-470b-95ba-1a2e20cff6d7
    239    NA  abdd46ff-3b81-4b5e-85a1-95db783ff2fa
    240    NA  3587088d-a592-4f21-93fe-760e9276fbd9
    241    NA  8eb7ae93-ab3a-4e3c-ad42-129949dd0545
    242    NA  8ff017c0-aae2-43e8-b87b-2889b9bb0df8
    243    NA  1ad7c9d6-8da4-46e8-97ee-c0a6c3299d27
    244    NA  fa5e928f-a906-480f-824c-4615a155cc2a
    245    NA  1d1b709d-3472-4bd1-961b-453368b6845a
    246    NA  e302d87f-30d7-46aa-9358-5705a31cf31b
    247    NA  ad1043ba-5807-482b-a4a0-624bbfbd36af
    248    NA  94c4453f-1309-4931-9a8f-5400a1b104f5
    249    NA  37aca929-a2fd-4055-a416-865b92ebe58c
    250    NA  ad64522f-054a-4fcc-b1c2-fbf70eb06a62
    251    NA  c30a75f1-c144-4d62-a3bb-9f41615b9ca0
    252    NA  559c0a5e-7e76-4e4f-a560-1c4d52b31384
    253    NA  5a123bb7-9ffb-46b2-b315-6e2f0aca7aed
    254    NA  595da36e-ec30-40e4-abc1-b98400ae9303
    

    这是docid的数据帧,我希望从上面的FileListDF中删除匹配的值,以查找FileListDF中但不在InterimDF中的相同行数

    InterimDF.to_json
    
    <bound method NDFrame.to_json of                                     DocID
    0    c2571c68-0705-4c87-a42b-0971a73fedf4
    1    ab317908-8720-47ce-a8f6-453169259e48
    2    2b3d1aee-b4f6-4278-9db2-c7b6819bd14f
    3    9d4b9084-f623-445c-86d9-0458d6fb4135
    4    56fa7edb-5020-44e7-98bc-5b0878fbf9ac
    5    7c27bcb8-b381-465e-a6e9-edc38cf24e5a
    6    a56e6f13-bf05-474a-9930-5f0ebad33eba
    7    81c31d76-cec3-4f1c-8ec1-107ecc1169d4
    8    1f8acf61-0a08-4b42-8ad1-38eeebc85ff2
    9    d6a6704c-21c6-4dc5-b9bd-de9e67ee7970
    10   d63dde24-9a2b-410d-aa7d-6a854b646a12
    11   a80367f9-36af-45bd-878e-d19461a3d3df
    12   31f74aff-e8df-43bb-a7d4-314caea6de20
    13   99052ee4-2481-4c81-92dd-81b446e76c15
    14   3827d0b4-5d0a-490a-a561-bf631bae6492
    15   fdbde364-776d-4532-8aca-874144a85df8
    16   4dfa630f-20b0-4a9e-bcf0-cdd4f18bb215
    17   f5b16b14-a058-4f2d-8461-25e68b02c68b
    18   a9f23eee-5b44-4d4e-baf8-19b78b4e3508
    19   344bfb49-3b22-4e2a-b815-a9eb46cbed1b
    20   ade5b2c3-5962-4d6d-9252-4f670b127682
    21   43fa1fb2-3681-4372-ab5a-f12059b3741c
    22   77f19056-fef8-4353-9740-01afa5557f47
    23   1c7cfc8a-57c6-4ad3-bce7-63b006352ed1
    24   e6db5e3c-4b3e-4b27-9558-e7334032c9c0
    25   68adb3b6-dfeb-4645-8ed2-610ae896e8c4
    26   c0163ee2-9e34-4636-a734-051151c50f29
    27   7b7d8549-8697-4b19-a033-e0068f5aa4d7
    28   7a976925-57c5-4972-a9e6-278dd322f6b4
    29   6533a84b-a81d-411a-b44e-3090cd01396d
    ..                                    ...
    216  f7217b94-b939-4bcf-b8c8-d385e4473c63
    217  74522cf3-7fd5-4b84-b175-23f84ede25ae
    218  cd3a901e-a077-472e-a100-f082f011cc72
    219  d462729a-e1bc-4502-abc1-8dbd2b21103d
    220  f5338d8d-13dc-4a03-b720-1056419deb68
    221  37a41e22-3698-459d-b3cd-0d419a0f3f48
    222  e48c4463-565d-4f09-9ccc-03e0506ddf1e
    223  8531a863-4b0b-4bfa-9c4a-05056e7c1165
    224  01e72b43-730d-42a7-9405-6fd45b2141b5
    225  5ebbf1d4-48f1-4d1f-a5bc-1e865c92146f
    226  c3103348-2209-42de-8005-06a9af3ca43f
    227  b603b2cd-c4c9-4a2e-a52f-361a1e06b277
    228  d08b06d1-b8fc-432b-94af-a25d9ef8b4fe
    229  d32419e8-e327-470b-95ba-1a2e20cff6d7
    230  abdd46ff-3b81-4b5e-85a1-95db783ff2fa
    231  3587088d-a592-4f21-93fe-760e9276fbd9
    232  8eb7ae93-ab3a-4e3c-ad42-129949dd0545
    233  8ff017c0-aae2-43e8-b87b-2889b9bb0df8
    234  1ad7c9d6-8da4-46e8-97ee-c0a6c3299d27
    235  fa5e928f-a906-480f-824c-4615a155cc2a
    236  1d1b709d-3472-4bd1-961b-453368b6845a
    237  e302d87f-30d7-46aa-9358-5705a31cf31b
    238  ad1043ba-5807-482b-a4a0-624bbfbd36af
    239  94c4453f-1309-4931-9a8f-5400a1b104f5
    240  37aca929-a2fd-4055-a416-865b92ebe58c
    241  ad64522f-054a-4fcc-b1c2-fbf70eb06a62
    242  c30a75f1-c144-4d62-a3bb-9f41615b9ca0
    243  559c0a5e-7e76-4e4f-a560-1c4d52b31384
    244  5a123bb7-9ffb-46b2-b315-6e2f0aca7aed
    245  595da36e-ec30-40e4-abc1-b98400ae9303
    
    1 回复  |  直到 6 年前
        1
  •  1
  •   ScottieB    6 年前

    所以如果我理解正确的话,您有两个数据帧,每个数据帧都有一个列,两个数据帧之间有一些重叠。

    为了简单起见,我要叫他们 df_one df_two 他们都有一个 id

    你可以用 isin()

    import pandas as pd
    
    df_one = pd.DataFrame(data={'id':[1,2,3,4,5]})
    df_two = pd.DataFrame(data={'id':[3,4,5,6,7]})
    
    new_ids = ~df_one['id'].isin(df_two['id'])
    
    df_only_in_one = df_one[new_ids]