代码之家 › 专栏 › 技术社区 › eddiewastaken

在Python中使用“for x in list”访问x+1元素

python

eddiewastaken · 技术社区 · 5 年前

我正在尝试将一个新行分隔的文本文件解析为行块,这些行块附加到一个.txt文件中。我想能够抓取x数量的行后,我的结束字符串,因为这些行的内容将有所不同,这意味着设置'结束字符串'试图匹配它将错过行。

文件示例:

"Start"
"..."
"..."
"..."
"..."
"---" ##End here
"xxx" ##Unique data here
"xxx" ##And here

这是密码

first = "Start"
first_end = "---"

with open('testlog.log') as infile, open('parsed.txt', 'a') as outfile:
    copy = False
    for line in infile:
        if line.strip().startswith(first):
            copy = True
            outfile.write(line)
        elif line.strip().startswith(first_end):
            copy = False
            outfile.write(line)
            ##Want to also write next 2 lines here
        elif copy:
            outfile.write(line)

for line in infile ,还是需要使用不同类型的循环?

0 回复 | 直到 5 年前

Kevin 5 年前

你可以用 next 或 readline

    elif line.strip().startswith(first_end):
        copy = False
        outfile.write(line)
        outfile.write(next(infile))
        outfile.write(next(infile))

或

    #note: not compatible with Python 2.7 and below
    elif line.strip().startswith(first_end):
        copy = False
        outfile.write(line)
        outfile.write(infile.readline())
        outfile.write(infile.readline())

for line in infile: 会跳过你读的两行 读线 .

额外的术语挑剔:文件对象不是列表,访问列表的x+1元素的方法可能不适用于访问文件的下一行,反之亦然。如果你做要访问正确列表对象的下一项,可以使用 enumerate

seq = ["foo", "bar", "baz", "qux", "troz", "zort"]

#find all instances of "baz" and also the first two elements after "baz"
for idx, item in enumerate(seq):
    if item == "baz":
        print(item)
        print(seq[idx+1])
        print(seq[idx+2])

注意,不像 读线 ,索引将不会推进迭代器,因此 for idx, item in enumerate(seq): 仍将迭代“qux”和“troz”。

一种有效的方法任何 iterable是使用一个额外的变量来跟踪迭代过程中的状态。这样做的好处是,您不必知道如何手动推进iterables;缺点是对循环中的逻辑进行推理比较困难,因为它暴露了额外的副作用。

first = "Start"
first_end = "---"

with open('testlog.log') as infile, open('parsed.txt', 'a') as outfile:
    copy = False
    num_items_to_write = 0
    for line in infile:
        if num_items_to_write > 0:
            outfile.write(line)
            num_items_to_write -= 1
        elif line.strip().startswith(first):
            copy = True
            outfile.write(line)
        elif line.strip().startswith(first_end):
            copy = False
            outfile.write(line)
            num_items_to_write = 2
        elif copy:
            outfile.write(line)

import re

with open("testlog.log") as file:
    data = file.read()

pattern = re.compile(r"""
^Start$                 #"Start" by itself on a line
(?:\n.*$)*?             #zero or more lines, matched non-greedily
                        #use (?:) for all groups so `findall` doesn't capture them later
\n---$                  #"---" by itself on a line
(?:\n.*$){2}            #exactly two lines
""", re.MULTILINE | re.VERBOSE)

#equivalent one-line regex:
#pattern = re.compile("^Start$(?:\n.*$)*?\n---$(?:\n.*$){2}", re.MULTILINE)

for group in pattern.findall(data):
    print("Found group:")
    print(group)
    print("End of group.\n\n")

当在如下日志上运行时:

Start
foo
bar
baz
qux
---
troz
zort
alice
bob
carol
dave
Start
Fred
Barney
---
Wilma
Betty
Pebbles

... 这将产生输出:

Found group:
Start
foo
bar
baz
qux
---
troz
zort
End of group.


Found group:
Start
Fred
Barney
---
Wilma
Betty
End of group.

Maarten Fabré 5 年前

最简单的方法是生成一个解析infile的生成器函数:

def read_file(file_handle, start_line, end_line, extra_lines=2):
    start = False
    while True:
        try:
            line = next(file_handle)
        except StopIteration:
            return

        if not start and line.strip().startswith(start_line):
            start = True
            yield line
        elif not start:
            continue
        elif line.strip().startswith(end_line):
            yield line
            try:
                for _ in range(extra_lines):
                    yield next(file_handle)
            except StopIteration:
                return
        else:
            yield line

try-except 如果您知道每个文件的格式都很好,则不需要子句。

您可以这样使用此生成器:

if __name__ == "__main__":
    first = "Start"
    first_end = "---"

    with open("testlog.log") as infile, open("parsed.txt", "a") as outfile:
        output = read_file(
            file_handle=infile,
            start_line=first,
            end_line=first_end,
            extra_lines=1,
        )
        outfile.writelines(output)

guillaume.deslandes 5 年前

@Kevin answer的一个变体,带有一个3状态变量和较少的代码重复。

first = "Start"
first_end = "---"
# Lines to read after end flag
extra_count = 2

with open('testlog.log') as infile, open('parsed.txt', 'a') as outfile:
    # Do no copy by default
    copy = 0

    for line in infile:
        # Strip once only
        clean_line = line.strip()

        # Enter "infinite copy" state
        if clean_line.startswith(first):
            copy = -1

        # Copy next line and extra amount
        elif clean_line.startswith(first_end):
            copy = extra_count + 1

        # If in a "must-copy" state
        if copy != 0:
            # One less line to copy if end flag passed
            if copy > 0:
                copy -= 1
            # Copy current line
            outfile.write(line)