代码之家  ›  专栏  ›  技术社区  ›  TerminatorX

返回参数sed/grep/awk/gawk内的字符串

  •  2
  • TerminatorX  · 技术社区  · 6 年前

    需要一些帮助才能在2个特定分隔符内返回日志文件中的所有数据。 我们通常有如下日志:

    2018-04-17 03:59:29,243 TRACE [xml] This is just a test.
    2018-04-17 13:22:24,230 INFO [properties] I believe this is another test.
    2018-04-18 03:48:07,043 ERROR [properties] (Thread-13) UpdateType: more data coming here; ProcessId: 5010
    2018-04-17 13:22:24,230 INFO [log] I need to retrieve this string here
    and also this one as it is part of the same text
    2018-04-17 13:22:24,230 INFO [det] I believe this is another test.
    

    如果我grep“here”,我只得到包含单词的行,但实际上我需要检索整个文本,这些中断可能也会导致我的问题。

    2018-04-17 13:22:24,230 INFO [log] I need to retrieve this string here
    and also this one as it is part of the same text
    

    日志文件中可能有几个“here”。 我试图通过sed来完成,但我找不到正确的方法来使用分隔符,我认为应该是整个日期。

    我真的很感谢你在这方面的帮助。

    Karakfa评论后的新示例

    2018-04-17 03:48:07,044 INFO  [passpoint-logger] (Thread-19) ERFG|1.0||ID:414d512049584450414153541541871985165165130312020203aa4b|Thread-19|||2018-04-17 03:48:07|out-1||out-1|
    2018-04-17 03:59:29,243 TRACE [xml] (Thread-19) RAW MED XML: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><MED:MED_PMT_Tmp_Notif xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://services.xxx.com/POQ/v01" xmlns:POQ="http://services.xxx.com/POQ/v01" xmlns:MED="http://services.xxx.com/MED/v1.2" version="1.2.3" messageID="15290140135778972043" Updat584ype="PGML" xsi:schemaLocation="http://services.xxx.com/MED/v1.2 MED_PMT_v.1.2.3.xsd">
        <MED_Space xmlns:ns2="http://services.xxx.com/MED/v1.2" xmlns:ns4="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns3="http://services.xxx.com/POQ_Header/v01" status="AVAIL" dest="MQX" aircraftType="DH8" aircraftConfig="120">
            <Space_ID partition="584" orig="ADD3" messageCreate="2018-04-17T03:59:29.202-05:00">
                <Space carrier="584" date="2018-04-18">0108</Space>
            </Space_ID>
            <DepartAndArrive estDep="2018-04-18T18:10:00+03:00" schedDep="2018-04-18T18:10:00+03:00" estArrival="2018-04-18T19:30:00+03:00" schedArrival="2018-04-18T19:30:00+03:00"/>
            <Sched_OandD orig="ADD3" dest="MQX"/>
        </MED_Space>
        <TRX_Record xmlns:ns2="http://services.xxx.com/MED/v1.2" xmlns:ns4="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns3="http://services.xxx.com/POQ_Header/v01">
            <TRX_ID FILCreate="2018-04-17T03:59:00-05:00" resID="1">TFRSVL</TRX_ID>
            <Space>
                <Inds revenue="1"/>
                <Identification nameID="1" dHS_ID="TFRSVL001" gender="X">
                    <Name_First>SMITH MR</Name_First>
                    <Name_Last>P584ER</Name_Last>
                    <TT tier="0"/>
                </Identification>
                    <TRXType>F</TRXType>
                <SRiuyx>0</SRiuyx>
                <GroupRes>1</GroupRes>
                <SystemInstances inventory="H">Y</SystemInstances>
                <OandD_FIL orig="ADD3" dest="MQX"/>
                <Store="584">0108</Store>
                <CodingSpec="584">0108</CodingSpec>
            </Space>
        </TRX_Record>
            <ns2:TRX_Count xmlns:ns2="http://services.xxx.com/MED/v1.2" xmlns:ns4="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns3="http://services.xxx.com/POQ_Header/v01">1</ns2:TRX_Count>
        <ns2:Transaction_D584ails xmlns:ns2="http://services.xxx.com/MED/v1.2" xmlns:ns4="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns3="http://services.xxx.com/POQ_Header/v01" sourceID="TPF">
            <Client_Entry_Info authRSX="54" agx="S4" code="ADD3">RESTORE AMEND:NEW-FIL/AFAX-UPDATED</Client_Entry_Info>
        </ns2:Transaction_D584ails>
    </MED:MED_PMT_Tmp_Notif>
    2018-04-17 03:59:29,244 INFO  [properties] (Thread-19) Updat584ype: PGML ; ProcessId: ##MISSING##
    

    以下条目未返回全文: awk-v RS=“(^ |\n)[0-9:,-]+”“/TFRSVL/{打印RS,$0}{RS=RT}”文件

    2 回复  |  直到 6 年前
        1
  •  1
  •   karakfa    6 年前

    使用GNU awk 多字符记录分隔符

    $ awk -v RS='(^|\n)[0-9 :,-]+' '/here/{print rs,$0} {rs=RT}' file
    
    2018-04-18 03:48:07,043  ERROR [properties] (Thread-13) UpdateType: more data coming here; ProcessId: 5010
    
    2018-04-17 13:22:24,230  INFO [log] I need to retrieve this string here
    and also this one as it is part of the same text
    

    NB公司 在这里,我通过创建使用时间戳中的值的记录分隔符来作弊。您可以精确地描述它,以消除在第二行开始处出现的误报。或者,也可以将调试级别添加到匹配中。

        2
  •  1
  •   Ed Morton    6 年前

    假设每个记录都以时间戳开始,然后是所有大写字母的字符串,然后是方括号内的另一个字符串:

    $ cat tst.awk
    /^[0-9]{4}(-[0-9]{2}){2} [0-9]{2}(:[0-9]{2}){2},[0-9]{3} [[:upper:]]+ \[[^][]+\] / { prt() }
    { rec = (rec=="" ? "" : rec ORS) $0 }
    END { prt() }
    
    function prt() {
        if (rec ~ regexp) {
            print rec
            print "----"
        }
        rec = ""
    }
    
    $ awk -v regexp='here' -f tst.awk file
    2018-04-18 03:48:07,043 ERROR [properties] (Thread-13) UpdateType: more data coming here; ProcessId: 5010
    ----
    2018-04-17 13:22:24,230 INFO [log] I need to retrieve this string here
    and also this one as it is part of the same text
    ----
    

    如果没有足够的限制,您可以将起始regexp更改为其他内容,例如,如果记录中的文本在下一行的开始处以与该regexp匹配的字符串结尾(尽管我不知道您实际如何处理,因为您已经向我们展示了这些内容)。

    另外,想想这是在做什么:

    $ cat tst.awk
    /^[0-9]{4}(-[0-9]{2}){2} [0-9]{2}(:[0-9]{2}){2},[0-9]{3} [[:upper:]]+ \[[^][]+\] / { prt() }
    { rec = (rec=="" ? "" : rec ORS) $0 }
    END { prt() }
    
    function prt(   flds,recDate,recTime,recPrio,recType,recText) {
        split(rec,flds)
        recDate = flds[1]
        recTime = flds[2]
        recPrio = flds[3]
        recType = flds[4]
        gsub(/[][]/,"",recType)
        recText = rec
        sub(/([^[:space:]]+ ){4}/,"",recText)
        gsub(/[[:space:]]+/," ",recText)
    
        if (NR > 1) {
            if ( date=="" || date==recDate ) {
                printf "date = <%s>\n", recDate
                printf "time = <%s>\n", recTime
                printf "prio = <%s>\n", recPrio
                printf "type = <%s>\n", recType
                printf "text = <%s>\n", recText
                print "----"
            }
        }
        rec = ""
    }
    

    $ awk -v date='2018-04-18' -f tst.awk file
    date = <2018-04-18>
    time = <03:48:07,043>
    prio = <ERROR>
    type = <properties>
    text = <(Thread-13) UpdateType: more data coming here; ProcessId: 5010>
    ----
    

    $ awk -f tst.awk file
    date = <2018-04-17>
    time = <03:59:29,243>
    prio = <TRACE>
    type = <xml>
    text = <This is just a test.>
    ----
    date = <2018-04-17>
    time = <13:22:24,230>
    prio = <INFO>
    type = <properties>
    text = <I believe this is another test.>
    ----
    date = <2018-04-18>
    time = <03:48:07,043>
    prio = <ERROR>
    type = <properties>
    text = <(Thread-13) UpdateType: more data coming here; ProcessId: 5010>
    ----
    date = <2018-04-17>
    time = <13:22:24,230>
    prio = <INFO>
    type = <log>
    text = <I need to retrieve this string here and also this one as it is part of the same text>
    ----
    date = <2018-04-17>
    time = <13:22:24,230>
    prio = <INFO>
    type = <det>
    text = <I believe this is another test.>
    ----
    

    想象一下,使用这种方法,您可以轻松地在日志记录的特定字段上创建精确的查询,生成CSV以导入Excel等。。。