代码之家 › 专栏 › 技术社区 › Life2Day

Python—使用+/-振荡值迭代数据帧,并根据条件创建新列

iterator dataframe loops pandas python

Life2Day · 技术社区 · 7 年前

需要帮助根据两个振荡值的条件将不同的状态标记到新的数据帧列中;列X和;Y

使用列Y作为状态间隔。状态间隔从0开始,到0结束。请注意,Y列中的值将始终保持在正或负范围内。每个间隔周期的顺序为+、-、+、-等。

当Y列值变为大于0的正值时开始标记,并在变为负值之前在0处停止标记;是周期的结束,将开始下一个范围或进入负范围的周期。

共有6种模式:a、B、C、D、E、F作为循环状态。我试图找出逻辑,以及如何将每个状态的标签添加到名为state的新数据帧列中。为每个周期进行标记,并在每个新的周期状态下重新开始。

+-------+-------------+---------+  
| State |      X      |    Y    |  
+-------+-------------+---------+  
|   A   | from - to + |    +    |  
|   B   |      +      |    +    |  
|   C   |      -      |    +    |  
|   D   |      +      |    -    |  
|   E   |      -      |    -    |  
|   F   | from + to - |    -    |  
+-------+-------------+---------+

状态A和;F、 (列X)的值从+到-或反之亦然,交叉超过0。列Y中的值将始终保持在正或负范围内。

状态B、C、D、E在(第X列)中没有交叉。以下是数据帧值示例和具有结果状态的新列示例。

+----+---------+---------+-------+  
|  # |    X    |    Y    | State |  
+----+---------+---------+-------+  
|  1 | -0.0034 |  0.0056 |   A   | Cycle 1 (+)  
|  2 | -0.0001 |  0.0070 |   A   |  
|  3 |  0.0019 |  0.0073 |   A   |  
|  4 |  0.0039 |  0.0075 |   A   |  
|    |         |         |       |  
|  5 |  0.0273 | -0.0037 |   D   | Cycle 2 (-)  
|  6 |  0.0237 | -0.0059 |   D   |  
|    |         |         |       |  
|  7 |  0.0047 |  0.0028 |   B   | Cycle 3 (+)  
|  8 |  0.0044 |  0.0020 |   B   |  
|    |         |         |       |  
|  9 | -0.0034 | -0.0006 |   E   | Cycle 4 (-)    
| 10 | -0.0045 | -0.0014 |   E   |  
|    |         |         |       |  
| 11 | -0.0021 |  0.0006 |   C   | Cycle 5 (+)  
| 12 | -0.0019 |  0.0007 |   C   |  
|    |         |         |       |  
| 13 |  0.0041 | -0.0054 |   F   | Cycle 6 (-)  
| 14 |  0.0017 | -0.0060 |   F   |  
| 15 | -0.0021 | -0.0059 |   F   |  
| 16 | -0.0023 | -0.0057 |   F   |  
+----+---------+---------+-------+  
Cycles will continue 7, 8, 9, 10, etc. in the time series

具有12个周期的数据帧,类似于上面的示例,在结果中显示了两次模式A、B、C、D、E、F。

df = pd.DataFrame({
    'x': [-0.0034, -0.0001, 0.0019, 0.0039, 0.0273, 0.0237, 0.0047, 0.0044, -0.0034, -0.0045, -0.0021, -0.0019, 0.0041, 0.0017, -0.0021, -0.0023, -0.0014, -0.0002, 0.0018, 0.0031, 0.0171, 0.0230, 0.0035, 0.0040, -0.0030, -0.0040, -0.0020, -0.0015, 0.0030, 0.0010, -0.0030, -0.0020, ],
    'y': [0.0056, 0.007, 0.0073, 0.0075, -0.0037, -0.0059, 0.0028, 0.002, -0.0006, -0.0014, 0.0006, 0.0007, -0.0054, -0.006, -0.0059, -0.0057, 0.0040, 0.005, 0.0065, 0.0070, -0.0022, -0.0045, 0.0020, 0.001, -0.0005, -0.0010, 0.0003, 0.0005, -0.0050, -0.005, -0.0060, -0.0040, ],
})

下一个示例是开始对数据帧进行迭代编码,并需要帮助构建逻辑,包括;F表示,遍历每个+/-周期,并指导如何遍历Y列,在X列中查找交叉值。

State = []

for i, row in df.iterrows():  #i: dataframe index; row: each row in series format  
    if row['X'] > 0 and row['Y'] > 0:  
        State.append('B')  
    elif row['X'] < 0 and row['Y'] > 0:  
        State.append('C')  
    elif row['X'] > 0 and row['Y'] < 0:  
        State.append('D')  
    elif row['X'] < 0 and row['Y'] < 0:  
        State.append('E')  
    else:  
        State.append('err')  

df['State'] = State  
print(df)

同样,上述代码不包含&F州。

使现代化

仍然需要帮助,下面是带注释的更新代码,并将解释什么不起作用。

# Creating new column as + or - based on Column Y value
df['y_pos'] = np.where((df.y > 0), True, False)

# Creating new column to label the cycle as they are increasing order 1,2,3, etc.
df['cycle_n'] = (df.y_pos != df.y_pos.shift(1)).cumsum()

# returns dictionary whose keys and values are from DataFrames
# to be able to loop through the cycles
gb = df.groupby('cycle_n')
groups = dict(list(gb))

State = []

for name, group in gb:
    # Information to help compare our final results
    print("Group:" + str(name) )
    print("=====================")
    print("Min:" + str(group.min()) )
    print("Max:" + str(group.max()) )
    print("--- Group Data -----")
    print(group)
    print("--------------------")
    print("--- Column X Row Data -----")

    for index, row in group.iterrows(): # loop each row

        if row['y_pos'] == True: # Column Y is (+)

            print( row['x'] ) # row data value for Column X

        # trying to use min and max in each cycle to figure out
        # if there is a crossover 

        # ISSUE: min and max is holding data values for each of the
        # columns, not only Column X which maybe the reason why 
        # it's not working correctly

            if [ (group.min() <= 0) & (group.max() >= 0) ]:
                State.append('A')
            elif row['x'] >= 0:
                State.append('B')
            elif row['x'] < 0:
                State.append('C')
            else:  
                State.append('err')

        elif row['y_pos'] == False: # Column Y is (-)

            print( row['x'] )

        # ISSUE: again min and max is holding data values for each of the
        # columns, maybe the reason why it's not working correctly

            if [ (group.max() >= 0) & (group.min() <= 0) ]:
                State.append('F')
            elif row['x'] >= 0:
                State.append('D')
            elif row['x'] < 0:
                State.append('E')
            else:  
                State.append('err')
        else:
            print("err")

df['State'] = State  

# Combining y_pos & cycle_n to be printed out.
df['Label'] = 'Cycle ' + df.cycle_n.astype(str) + ' ' + df.y_pos.map({True: '(+)', False: '(-)'})

del df['y_pos']
del df['cycle_n']

print(df)

此代码的问题。仅标记状态A(&A);F now并将其他状态错误标记为A或F。使用min和max的If语句返回true;确实不正确,因为它包含字典中所有列mins和max的值。例如

print("Min:" + str(group.min()) )

Min:
x         -0.0034
y          0.0056
y_pos      1.0000
cycle_n    1.0000
dtype: float64

不知道这是否是最好的方法,只是离它正常工作越来越近。

1 回复 | 直到 7 年前

ASGM 7 年前

有一种方法可以实现您的目标:

import pandas as pd
import numpy as np

# Define the cycles
df['y_pos'] = np.where((df.y > 0), True, False)
df['cycle_n'] = (df.y_pos != df.y_pos.shift(1)).cumsum()

# Function to classify states based on x and y
def classify_state(df):
    x_pos = df.x.max() >= 0
    x_neg = df.x.min() < 0
    y_pos = df.y_pos.any()

    if y_pos:
        if x_pos and x_neg:
            state = 'A'
        elif x_pos:
            state = 'B'
        else:
            state = 'C'
    else:
        if x_pos and x_neg:
            state = 'F'
        elif x_pos:
            state = 'D'
        else:
            state = 'E'

    df['state'] = state
    return df

# Apply that function over the cycles
df = df.groupby('cycle_n').apply(classify_state)

# Make the labels and clean up the temporary columns
df['label'] = 'Cycle ' + df.cycle_n.astype(str) + ' ' + df.y_pos.map({True: '(+)', False: '(-)'})
del df['cycle_n']
del df['y_pos']

以下几点:

如前所述,逻辑可行,但有点复杂。几乎可以肯定的是,你可以用较少的行数来完成这项工作,但我把它留了很长的篇幅,以便弄清楚到底发生了什么。
的值 0 在编写的代码中被认为是积极的,但您可以通过更改 (df.iloc[[0, -1], 0] >= 0) 。

编辑1:感谢您对问题的全面更新。你现在要找的东西更清楚了,我也相应地改变了答案。

编辑2:我修改了代码以考虑所有 df.x 一个周期内的值,而不仅仅是第一个和最后一个值。