我试图在一个更大的数据帧中添加一个包含真值和假值的新列,这取决于其他列的值的组合是否在其他数据帧或数组中退出。
我最初试过这个:
Activity = pd.DataFrame(list(itertools.product(ActivityLog1['_created_at$AL'].unique(), _User['_p_user'].unique())),\
columns = ['date','_p_user'])
dft = ActivityLog1[['_created_at$AL','_p_user']].values
Activity['active'] = Activity.apply(lambda x: x[['date','_p_user']].values in dft,axis=1)
但由于数据行太多,所以我将apply函数改为:
Activity['active'] = np.where(Activity[['date','_p_user']].values in dft, True, False)
C:\Anaconda3\lib\site-packages\ipykernel_launcher.py:6: DeprecationWarning: elementwise == comparison failed; this will raise an error in the future.
Activity[['date','_p_user']].values
退货:
array([[Timestamp('2018-03-27 00:00:00'), 'Y5RKervPy0'],
[Timestamp('2018-03-27 00:00:00'), 'G3zTYHC9qj'],
[Timestamp('2018-03-27 00:00:00'), 'BeLqAK02Zo'],
...,
[Timestamp('2018-09-03 00:00:00'), 'mSEZo8qHe2'],
[Timestamp('2018-09-03 00:00:00'), 'zrERaksxxg'],
[Timestamp('2018-09-03 00:00:00'), '7q6EuwbCgj']], dtype=object)
和
dft
array([[Timestamp('2018-03-27 00:00:00'), 'BoMRF4HvNg'],
[Timestamp('2018-03-27 00:00:00'), 'B2QoOpL3dZ'],
[Timestamp('2018-03-27 00:00:00'), '7G2jZJbzjT'],
...,
[Timestamp('2018-08-17 00:00:00'), 'dMH2WDsbDY'],
[Timestamp('2018-08-27 00:00:00'), 'sW13lwCQEF'],
[Timestamp('2018-09-03 00:00:00'), 'RAJOMMfWH9']], dtype=object)
有没有其他更好和/或更快的方法来实现这一点?谢谢。