代码之家  ›  专栏  ›  技术社区  ›  eddiewastaken

在Python中使用“for x in list”访问x+1元素

  •  0
  • eddiewastaken  · 技术社区  · 5 年前

    我正在尝试将一个新行分隔的文本文件解析为行块,这些行块附加到一个.txt文件中。我想能够抓取x数量的行后,我的结束字符串,因为这些行的内容将有所不同,这意味着设置'结束字符串'试图匹配它将错过行。

    文件示例:

    "Start"
    "..."
    "..."
    "..."
    "..."
    "---" ##End here
    "xxx" ##Unique data here
    "xxx" ##And here
    

    这是密码

    first = "Start"
    first_end = "---"
    
    with open('testlog.log') as infile, open('parsed.txt', 'a') as outfile:
        copy = False
        for line in infile:
            if line.strip().startswith(first):
                copy = True
                outfile.write(line)
            elif line.strip().startswith(first_end):
                copy = False
                outfile.write(line)
                ##Want to also write next 2 lines here
            elif copy:
                outfile.write(line)
    

    for line in infile ,还是需要使用不同类型的循环?

    0 回复  |  直到 5 年前
        1
  •  5
  •   Kevin    5 年前

    你可以用 next readline

        elif line.strip().startswith(first_end):
            copy = False
            outfile.write(line)
            outfile.write(next(infile))
            outfile.write(next(infile))
    

        #note: not compatible with Python 2.7 and below
        elif line.strip().startswith(first_end):
            copy = False
            outfile.write(line)
            outfile.write(infile.readline())
            outfile.write(infile.readline())
    

    for line in infile: 会跳过你读的两行 读线 .


    额外的术语挑剔:文件对象不是列表,访问列表的x+1元素的方法可能不适用于访问文件的下一行,反之亦然。如果你 要访问正确列表对象的下一项,可以使用 enumerate

    seq = ["foo", "bar", "baz", "qux", "troz", "zort"]
    
    #find all instances of "baz" and also the first two elements after "baz"
    for idx, item in enumerate(seq):
        if item == "baz":
            print(item)
            print(seq[idx+1])
            print(seq[idx+2])
    

    注意,不像 读线 ,索引将不会推进迭代器,因此 for idx, item in enumerate(seq): 仍将迭代“qux”和“troz”。


    一种有效的方法 任何 iterable是使用一个额外的变量来跟踪迭代过程中的状态。这样做的好处是,您不必知道如何手动推进iterables;缺点是对循环中的逻辑进行推理比较困难,因为它暴露了额外的副作用。

    first = "Start"
    first_end = "---"
    
    with open('testlog.log') as infile, open('parsed.txt', 'a') as outfile:
        copy = False
        num_items_to_write = 0
        for line in infile:
            if num_items_to_write > 0:
                outfile.write(line)
                num_items_to_write -= 1
            elif line.strip().startswith(first):
                copy = True
                outfile.write(line)
            elif line.strip().startswith(first_end):
                copy = False
                outfile.write(line)
                num_items_to_write = 2
            elif copy:
                outfile.write(line)
    

    import re
    
    with open("testlog.log") as file:
        data = file.read()
    
    pattern = re.compile(r"""
    ^Start$                 #"Start" by itself on a line
    (?:\n.*$)*?             #zero or more lines, matched non-greedily
                            #use (?:) for all groups so `findall` doesn't capture them later
    \n---$                  #"---" by itself on a line
    (?:\n.*$){2}            #exactly two lines
    """, re.MULTILINE | re.VERBOSE)
    
    #equivalent one-line regex:
    #pattern = re.compile("^Start$(?:\n.*$)*?\n---$(?:\n.*$){2}", re.MULTILINE)
    
    for group in pattern.findall(data):
        print("Found group:")
        print(group)
        print("End of group.\n\n")
    

    当在如下日志上运行时:

    Start
    foo
    bar
    baz
    qux
    ---
    troz
    zort
    alice
    bob
    carol
    dave
    Start
    Fred
    Barney
    ---
    Wilma
    Betty
    Pebbles
    

    ... 这将产生输出:

    Found group:
    Start
    foo
    bar
    baz
    qux
    ---
    troz
    zort
    End of group.
    
    
    Found group:
    Start
    Fred
    Barney
    ---
    Wilma
    Betty
    End of group.
    
        2
  •  2
  •   Maarten Fabré    5 年前

    最简单的方法是生成一个解析infile的生成器函数:

    def read_file(file_handle, start_line, end_line, extra_lines=2):
        start = False
        while True:
            try:
                line = next(file_handle)
            except StopIteration:
                return
    
            if not start and line.strip().startswith(start_line):
                start = True
                yield line
            elif not start:
                continue
            elif line.strip().startswith(end_line):
                yield line
                try:
                    for _ in range(extra_lines):
                        yield next(file_handle)
                except StopIteration:
                    return
            else:
                yield line
    

    try-except 如果您知道每个文件的格式都很好,则不需要子句。

    您可以这样使用此生成器:

    if __name__ == "__main__":
        first = "Start"
        first_end = "---"
    
        with open("testlog.log") as infile, open("parsed.txt", "a") as outfile:
            output = read_file(
                file_handle=infile,
                start_line=first,
                end_line=first_end,
                extra_lines=1,
            )
            outfile.writelines(output)
    
        3
  •  1
  •   guillaume.deslandes    5 年前

    @Kevin answer的一个变体,带有一个3状态变量和较少的代码重复。

    first = "Start"
    first_end = "---"
    # Lines to read after end flag
    extra_count = 2
    
    with open('testlog.log') as infile, open('parsed.txt', 'a') as outfile:
        # Do no copy by default
        copy = 0
    
        for line in infile:
            # Strip once only
            clean_line = line.strip()
    
            # Enter "infinite copy" state
            if clean_line.startswith(first):
                copy = -1
    
            # Copy next line and extra amount
            elif clean_line.startswith(first_end):
                copy = extra_count + 1
    
            # If in a "must-copy" state
            if copy != 0:
                # One less line to copy if end flag passed
                if copy > 0:
                    copy -= 1
                # Copy current line
                outfile.write(line)