使用df。diff(),类似于
np.diff()
并检查是否有任何元素低于pd。时间差(0)。
df['check'] = ~df.diff(axis=1).lt(pd.Timedelta(0)).any(1)
完整示例:
import pandas as pd
import numpy
np.random.seed(333)
# Random dates from: https://stackoverflow.com/questions/50559078/
def pp(start, end, n):
start_u = start.value//10**9
end_u = end.value//10**9
return pd.DatetimeIndex((10**9*np.random.randint(start_u, end_u, n)).view('M8[ns]'))
n = 10
df = pd.DataFrame({
'A': pp(pd.Timestamp('2018'), pd.Timestamp('2019'), n),
'B': pp(pd.Timestamp('2018'), pd.Timestamp('2019'), n),
'C': pp(pd.Timestamp('2018'), pd.Timestamp('2019'), n),
'D': pp(pd.Timestamp('2018'), pd.Timestamp('2019'), n)
})
df['check'] = ~df.diff(axis=1).lt(pd.Timedelta(0)).any(1)
print(df)
返回:
A B C \
0 2018-07-30 04:54:04 2018-03-13 00:28:13 2018-08-24 11:01:29
1 2018-12-26 21:22:20 2018-09-23 14:25:11 2018-08-19 07:21:59
2 2018-04-29 17:15:57 2018-05-28 12:35:35 2018-10-16 00:19:11
3 2018-12-11 06:56:35 2018-08-15 00:12:12 2018-08-05 23:47:08
4 2018-03-04 11:00:03 2018-07-03 07:22:30 2018-09-09 01:45:09
5 2018-08-22 03:24:30 2018-12-17 17:38:34 2018-01-29 13:02:29
6 2018-04-21 01:10:14 2018-06-09 20:37:08 2018-04-30 12:30:00
7 2018-06-27 18:40:46 2018-09-15 10:26:06 2018-05-13 03:51:36
8 2018-03-18 06:31:24 2018-11-10 06:24:12 2018-02-25 02:58:15
9 2018-11-08 17:52:19 2018-03-27 01:02:12 2018-03-06 00:10:02
D check
0 2018-07-30 16:16:03 False
1 2018-07-21 23:38:59 False
2 2018-10-25 03:46:37 True
3 2018-12-01 07:43:53 False
4 2018-12-07 16:11:31 True
5 2018-09-17 14:58:20 False
6 2018-07-02 09:36:35 False
7 2018-03-16 23:21:27 False
8 2018-10-30 11:24:01 False
9 2018-04-03 12:17:52 False
标杆管理
%timeit ~df.diff(axis=1).lt(pd.Timedelta(0)).any(1)
%timeit df.eval('A <= B <= C <= D')
10000行:
#1000 loops, best of 3: 1.58 ms per loop
#100 loops, best of 3: 3.31 ms per loop
10.000.000行:
#1 loop, best of 3: 2.27 s per loop
#1 loop, best of 3: 243 ms per loop