代码之家  ›  专栏  ›  技术社区  ›  Imdadul Choudhury

Python:从标记之间的文本文件中随机抽取行数

  •  0
  • Imdadul Choudhury  · 技术社区  · 6 年前

    我有一些包含1000多行的文本文件。它包含以下格式的一些行:

    seq open @ 2018/02/26 23:07:51 node: \nodes\wroot.nod (wroot)
    seq call @ 2018/02/26 23:07:51 node: ttt
    retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt
    BCU is working
    seq done @ 2018/02/26 23:07:55 node:ttt
    
    seq call @ 2018/02/26 23:07:55 node: fff
    Open the firewall
    Firewall opened
    seq done @ 2018/02/26 23:07:57 node: fff
    
    seq call @ 2018/02/26 23:07:57 node: \nodes\wchkefierror.bat (wroot#9^wchkefierror)
    seq done @ 2018/02/26 23:07:57 node: \nodes\wchkefierror.bat (wroot#9^wchkefierror)
    
    seq call @ 2018/02/26 23:07:57 node: \nodes\wuutmont.bat PTEFIE (wroot#12^wuutmont)
    
    SENDING UUTMonitor.exe /timeevent:PTEFIE
    seq done @ 2018/02/26 23:07:58 node: \nodes\wuutmont.bat PTEFIE (wroot#12^wuutmont)
    
    seq call @ 2018/02/26 23:07:58 node: \nodes\wProcessInit.bat (wroot#13^wProcessInit)
    
    02/26/2018 23:07:59 : @@@@ begin_\process\ProcessInit.bat
    
    <BISCON Version=xxxx">
    x
    y
    </BISCON>
    \process\ProcessInit.bat:::Parsing branding variables from INI files...
    found \flags\custom.ini
    PRODUCTIONLOCK not defined in custom.ini
    \process\ProcessInit.bat:::Calling SETVAR.BAT generated from INI data...
    02/26/2018 23:08:04 : @@@@ end\process\ProcessInit.bat
    seq done @ 2018/02/26 23:08:04 node: \nodes\wProcessInit.bat (wroot#13^wProcessInit)
    
    seq log @ 2018/02/26 23:08:04 node: skipping wroot#14^wbios as \flags\bios_flash_wnd.trg file not exists
    
    seq call @ 2018/02/26 23:08:04 node: aaa
    
    Get SkeletonPO from \working\ubera.ini
    seq done @ 2018/02/26 23:08:04 node: aaa
    

    我想在列表中提取seq call和seq done之间的行,如果该行以seq open或seq log开头,则在列表中插入NULL。

    正如您所看到的,seq call和seq done之间可能有任意数量的随机行,甚至是0。我一直在努力寻找答案,但无济于事。我也是python新手。

    上述示例的预期输出:

    NULL
    retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt
    BCU is working
    Open the firewall
    Firewall opened
    NULL
    SENDING UUTMonitor.exe /timeevent:PTEFIE
    02/26/2018 23:07:59 : @@@@ begin_\process\ProcessInit.bat
    
    <BISCON Version=xxxx">
    x
    y
    </BISCON>
    \process\ProcessInit.bat:::Parsing branding variables from INI files...
    found \flags\custom.ini
    PRODUCTIONLOCK not defined in custom.ini
    \process\ProcessInit.bat:::Calling SETVAR.BAT generated from INI data...
    02/26/2018 23:08:04 : @@@@ end\process\ProcessInit.bat
    NULL
    Get SkeletonPO from \working\ubera.ini
    
    2 回复  |  直到 6 年前
        1
  •  1
  •   tel    6 年前

    这里有一个快速而肮脏的方法来获得你想要的东西:

    def extractTxt(fpth, joinchar=' '):
        loglines = []
        with open(fpth) as f:
            incall = False
            calllines = []
    
            for line in f:
                if line.startswith('seq open') or line.startswith('seq log'):
                    loglines.append('NULL')
                elif line.startswith('seq call'):
                    incall = True
                elif incall:
                    if line.startswith('seq done'):
                        incall = False
                        call = joinchar.join(l for l in calllines if l)
                        calllines = []
    
                        if not call.strip():
                            loglines.append('NULL')
                        else:
                            loglines.append(call)
                    else:
                        calllines.append(line.strip())
    
        return loglines
    
    extractTxt('seq.txt')
    

    输出:

    ['NULL',
     'retrieve BIOS data using F:\\tools64\\BiosConfigUtility64.exe /GetConfig:\\working\\bcudump.txt BCU is working',
     'Open the firewall Firewall opened',
     'NULL',
     'SENDING UUTMonitor.exe /timeevent:PTEFIE',
     '02/26/2018 23:07:59 : @@@@ begin_\\process\\ProcessInit.bat <BISCON Version=xxxx"> x y </BISCON> \\process\\ProcessInit.bat:::Parsing branding variables from INI files... found \\flags\\custom.ini PRODUCTIONLOCK not defined in custom.ini \\process\\ProcessInit.bat:::Calling SETVAR.BAT generated from INI data... 02/26/2018 23:08:04 : @@@@ end\\process\\ProcessInit.bat',
     'NULL',
     'Get SkeletonPO from \\working\\ubera.ini']
    

    您可以通过传递不同的 joinchar 参数到 extractTxt .我将把任何进一步的造型/组织任务留作练习。

    详细信息

    线路:

    call = joinchar.join(l for l in calllines if l)
    

    做了一些不同的事情。这个 join method 将使用前面的字符串将字符串列表连接在一起。例如,以下表达式:

    ', '.join(['foo', 'bar', 'baz', 'bof'])
    

    将生成此输出:

    'foo, bar, baz, bof'
    

    括号内的线条部分:

    l for l in calllines if l
    

    是一种叫做 generator expression 。这要解释起来有点复杂,但基本上它所做的就是列出 calllines 不是空的。如果您感兴趣,请参阅链接页了解更多详细信息。您可以通过展开该行来简化该行。总而言之,以下几行:

    call = ''
    for l in calllines:
        # l evaluates to False if it is empty
        if l:
            call += l + joinchar
    
    # remove any trailing joinchar
    if call.endswith(joinchar):
        call = call[:-len(joinchar)]
    

    将具有与单行相同的效果 call = joinchar.join(l for l in calllines if l)

        2
  •  0
  •   alsjflalasjf.dev    6 年前
    import re
    
    begins_with_open_or_log=re.compile(r'seq open|seq log')
    begins_with_call_and_done=re.compile(r'seq call|seq done')
    
    with open('log.txt') as f:
        lines=f.readlines()
    i=0
    for line in lines:
        if re.match(begins_with_open_or_log,line):
            lines[i]='NULL\n'
        elif re.match(begins_with_call_and_done,line):
            lines[i]=''
        elif line=='\n':
            lines[i]=''
        i+=1
    print (''.join(lines),end='')
    

    我想在列表中提取seq call和seq done之间的行,如果该行以seq open或seq log开头,则在列表中插入NULL。

    这可能是您想要的输出:

    NULL
    retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt
    BCU is working
    Open the firewall
    Firewall opened
    SENDING UUTMonitor.exe /timeevent:PTEFIE
    02/26/2018 23:07:59 : @@@@ begin_\process\ProcessInit.bat
    <BISCON Version=xxxx">
    x
    y
    </BISCON>
    \process\ProcessInit.bat:::Parsing branding variables from INI files...
    found \flags\custom.ini
    PRODUCTIONLOCK not defined in custom.ini
    \process\ProcessInit.bat:::Calling SETVAR.BAT generated from INI data...
    02/26/2018 23:08:04 : @@@@ end\process\ProcessInit.bat
    NULL
    Get SkeletonPO from \working\ubera.ini
    

    但是,如果你是认真的:

    我想提取seq call和seq done之间的行

    请注意,例如

    retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt
    

    不属于您的输出。。。你需要尽可能精确


    注意:对于python 2.7,更改此行

    print (''.join(lines),end='')
    

    对于这个:

    print ''.join(lines)