我有以下熊猫数据集。
import numpy as np
import pandas as pd
events = ['event1', 'event2', 'event3', 'event4', 'event5', 'event6']
wells = [np.array([1, 2]), np.array([1, 3]), np.array([1]),
np.array([4, 5, 6]), np.array([4, 5, 6]), np.array([7, 8])]
traces_per_well = [np.array([24, 24]), np.array([24, 21]), np.array([18]),
np.array([24, 24, 24]), np.array([24, 21, 24]), np.array([18, 21])]
df = pd.DataFrame({"event_no": events, "well_array": wells,
"trace_per_well": traces_per_well})
df["total_traces"] = df['trace_per_well'].apply(np.sum)
df['supposed_traces_no'] = df['well_array'].apply(lambda x: len(x)*24)
df['pass'] = df['total_traces'] == df['supposed_traces_no']
print(df)
event_no well_array trace_per_well total_traces supposed_traces_no pass
0 event1 [1, 2] [24, 24] 48 48 True
1 event2 [1, 3] [24, 21] 45 48 False
2 event3 [1] [18] 18 24 False
3 event4 [4, 5, 6] [24, 24, 24] 72 72 True
4 event5 [4, 5, 6] [24, 21, 24] 69 72 False
5 event6 [7, 8] [18, 21] 39 48 False
trace_per_well
当它不等于24时,将放在一列中,并从列中取出相应的数组元素
well_array
在另一列中
结果应该是这样的。
event_no well_array trace_per_well total_traces supposed_traces_no pass wrong_trace_in_well wrong_well
0 event1 [1, 2] [24, 24] 48 48 True NaN NaN
1 event2 [1, 3] [24, 21] 45 48 False 21 3
2 event3 [1] [18] 18 24 False 18 1
3 event4 [4, 5, 6] [24, 24, 24] 72 72 True NaN NaN
4 event5 [4, 5, 6] [24, 21, 24] 69 72 False 21 5
5 event6 [7, 8] [18, 21] 39 48 False (18, 21) (7, 8)