代码之家 › 专栏 › 技术社区 › rpn

如何在列[1]中连续第二次出现“0”时返回列[0]的值

correspondence last-occurrence find-occurrences pandas

rpn · 技术社区 · 1 年前

问题/上下文 : 我正在尝试编写一个代码,该代码将读取指定文件夹中的每个CSV文件,在列索引[1]中查找连续第二次出现的零,并在找到时显示列索引[0]中的相应值。以下是我到目前为止所做的,但它不起作用。它只是简单地说,不存在具有两个连续零的列。然而,当我打开各个文件时,我可以清楚地看到有两个连续零的列。

当前代码 :

import os
import pandas as pd

folder_path = "/content/drive/session 1 & 2"

def find_column1_value_for_second_zero(file_path):
    try:
        df = pd.read_csv(file_path)
        consecutive_zeros = 0
        column1_value = None

        for _, row in df.iterrows():
            if row.iloc[1] == 0:
                consecutive_zeros += 1
                if consecutive_zeros == 2:
                    column1_value = row.iloc[0]
                    break
            else:
                consecutive_zeros = 0

        return column1_value
    except Exception as e:
        print(f"Error reading file '{file_path}': {str(e)}")
        return None

for filename in os.listdir(folder_path):
    if filename.endswith(".csv"):  # Assuming your files are CSV format
        file_path = os.path.join(folder_path, filename)
        
        column1_value = find_column1_value_for_second_zero(file_path)
        
        if column1_value is not None:
            print(f"In file '{filename}', the value in column 1 for the second zero in column 2 is: {column1_value}")
        else:
            print(f"In file '{filename}', no second zero in column 2 was found.")

预期结果 :获取列[0]中与列[1]中第二个连续零位于同一行的值。 实际结果 :每行返回“在第2列中找不到第二个零。”

1 回复 | 直到 1 年前

Derek O 1 年前

您可以设置 col0, col1 = df.columns[:2] ,然后使用以下一行,该行使用 col1 。

col1_value = df.loc[(df.groupby(col1).cumcount() == 1) & (df.col1 == 0), [col0]].values[0][0]

对于以下数据帧:

>>> df
   col0  col1
0     1     1
1     2     0
2     3     0 # second consecutive occurrence of 0
3     4     2
4     5     2
5     6     0
6     7     0
7     8     0
8     9     2
9    10     2

col1_value = 3 其对应于0的第二次连续出现。

Stuart Berg 1 年前

Pandas rolling 窗口对此很有用。

创建一个布尔序列,表示列值为0的位置
在长度为2的滚动窗口中,求和该布尔级数。
如果总和为2,则原始数据中的当前值和以前的值都为0。
查找第一个此类出现的行。

# Sample data
df = pd.DataFrame({'a': np.arange(0, 100, 10), 'b': [1,1,0,2,0,0,9,0,0,2]})

    a  b
0   0  1
1  10  1
2  20  0
3  30  2
4  40  0
5  50  0  # <-- Looking for this row
6  60  9
7  70  0
8  80  0
9  90  2

请注意 idxmax 返回的索引第一发生

idx = df.iloc[:, 1].eq(0).rolling(2).sum().eq(2).idxmax()
print(df.iloc[idx, 0])

推荐文章

Mainland · Python数据帧规范化值错误:列的长度必须与键相同

1 年前

user026 · 如何根据特定窗口的平均值(行数)创建新列?

1 年前

rpn · 如何在列[1]中连续第二次出现“0”时返回列[0]的值

1 年前

asmgx · 为什么合并数据帧不能按照python中的预期方式工作

1 年前

Gtoth · 如何分割Pandas DataFrame中包含多个日期的两个时间戳之间的差异

1 年前

Domarius · 使用loc为多行设置多列值

1 年前

Swastik Bhattacharyya · 如何在同一类别类型的多列上运行get_dummies()函数?

1 年前

DrZoidberg09 · 如何在字典列表中创建一个新关键字,该关键字是另一个关键字的总和?

1 年前

armstrong3701 · 如何有效地处理熊猫数据框中缺失的数据并计算条件统计?

1 年前

msts1906 · 大熊猫向乳胶的适当多品种出口

1 年前