我有一个大的数据框架,有2000行,两列,每列一行组成一个大约有1000个点的列表。我想把两列中的负值放在一起,然后计算最小值和最大值。
目前我正在完成
for
循环,需要30分钟才能完成。我可以通过矢量化操作执行相同的操作吗?
预期的解决方案方法:
df = pd.DataFrame({'x':[[-1,0,1,2,10],[1.5,2,4,5]],'y':[[2.5,2.4,2.3,1.5,0.1],[5,4.5,3,-0.1]]})
df =
x y
0 [-1, 0, 1, 2, 10] [2.5, 2.4, 2.3, 1.5, 0.1]
1 [1.5, 2, 4, 5] [5, 4.5, 3, -0.1]
### x, y are paired data coming from field. Ex, (-1,2.5), (0,2.4)
# First step: drop negative values in both x and y columns.
# Find a negative x or y and drop the pair.
# Ex, in first row, drop (-1,2.5) pair. That is, -1 in x and 2.5 in y.
# After dropping negative values
df =
x y
0 [0, 1, 2, 10] [2.4, 2.3, 1.5, 0.1]
1 [1.5, 2, 4] [5, 4.5, 3]
### Setp2: Find Max in each column
df =
x y xmax ymax
0 [0, 1, 2, 10] [2.4, 2.3, 1.5, 0.1] 10 2.4
1 [1.5, 2, 4] [5, 4.5, 3] 4 5
### Setp3: Find y@xmax, x@ymax in each column
df =
x y xmax ymax y@xmax x@ymax
0 [0, 1, 2, 10] [2.4, 2.3, 1.5, 0.1] 10 2.4 0.1 0
1 [1.5, 2, 4] [5, 4.5, 3] 4 5 3 1.5
目前的解决方案:它正在发挥作用,但需要花费大量时间。
for i in range(len(df)):
### create an auxiliary dataframe
auxdf = pd.DataFrame({'x':df['x'].loc[i],'y':df['y'].loc[i]})
## Step1: drop negative values
auxdf = auxdf[(auxdf['x']>0)&(auxdf['y']>0)]
### Step2: Max in x and y
xmax = auxdf['x'].max()
ymax = auxdf['y'].max()
### Step3: x@ymax, y@xmax
xatymax = auxdf['x'].loc[auxdf['y'].idxmax()]
yatxmax = auxdf['y'].loc[auxdf['x'].idxmax()]
### finally I append xmax,ymax,xatymax,yatxmax to the df
执行此矢量化操作将最小化时间?