代码之家  ›  专栏  ›  技术社区  ›  rockyroad

从日志文件中提取单词

  •  0
  • rockyroad  · 技术社区  · 6 年前

    我试图从日志文件中提取作业id,但在bash中提取它们时遇到问题。我试过用sed。

    我的日志文件如下:

    > 2018-06-16 02:39:39,331 INFO  org.apache.flink.client.cli.CliFrontend 
    > - Running 'list' command.
    > 2018-06-16 02:39:39,641 INFO  org.apache.flink.runtime.rest.RestClient                      
    > - Rest client endpoint started.
    > 2018-06-16 02:39:39,741 INFO  org.apache.flink.client.cli.CliFrontend                       
    > - Waiting for response...
    >  Waiting for response...
    > 2018-06-16 02:39:39,953 INFO  org.apache.flink.client.cli.CliFrontend                       
    > - Successfully retrieved list of jobs
    > ------------------ Running/Restarting Jobs -------------------
    > 15.06.2018 18:49:44 : 1280dfd7b1de4c74cacf9515f371844b : jETTY HTTP Server -> servlet with content decompress -> pull from
    > collections -> CSV to Avro encode -> Kafka publish (RUNNING)
    > 16.06.2018 02:37:07 : aa7a691fa6c3f1ad619b6c0c4425ba1e : jETTY HTTP Server -> servlet with content decompress -> pull from
    > collections -> CSV to Avro encode ->  Kafka publish (RUNNING)
    > --------------------------------------------------------------
    > 2018-06-16 02:39:39,956 INFO  org.apache.flink.runtime.rest.RestClient                      
    > - Shutting down rest endpoint.
    > 2018-06-16 02:39:39,957 INFO  org.apache.flink.runtime.rest.RestClient                      
    > - Rest endpoint shutdown complete.
    

    我正在使用以下代码提取包含jobid的行:

    extractRestResponse=`cat logFile.txt`
    echo "extractRestResponse: "$extractRestResponse
    
    w1="------------------ Running/Restarting Jobs -------------------"
    w2="--------------------------------------------------------------"
    extractRunningJobs="sed -e 's/.*'"$w1"'\(.*\)'"$w2"'.*/\1/' <<< $extractRestResponse"
    runningJobs=`eval $extractRunningJobs`
    echo "running jobs :"$runningJobs
    

    但是这没有给我任何结果。另外,我注意到在打印extractrestresponse变量时,所有新行都会丢失。

    我也试过使用这个命令,但没有给出任何结果:

    extractRestResponse="sed -n '/"$w1"/,/"$w2"/{//!p}' logFile.txt"
    
    3 回复  |  直到 6 年前
        1
  •  1
  •   SLePort    6 年前

    用SED:

    sed -n '/^-* Running\/Restarting Jobs -*/,/^--*/{//!p;}' logFile.txt
    

    说明:

    • 默认情况下,在应用命令后,输入行将回显到标准输出。这个 -n 标志禁止此行为
    • /^-* Running\/Restarting Jobs -*/,/^--*/ :匹配起始行 ^-* Running\/Restarting Jobs -* 高达 ^--* (包括在内)
    • //!p; :打印与地址匹配的行以外的行
        2
  •  1
  •   karakfa    6 年前

    awk 救命啊!

    awk '/^-+$/{f=0} f; /^-+ Running\/Restarting Jobs -+$/{f=1}' logfile
    
        3
  •  0
  •   builder-7000    6 年前

    您可以改进原始替换:

    sed -e 's/.*'"$w1"'\(.*\)'"$w2"'.*/\1/' <<< $extractRestResponse
    

    通过使用 @ 作为分隔符:

    sed -n "s@.*$w1\(.*\)$w2.*@\1@p" <<< $extractRestResponse
    

    输出是介于 $w1 $w2 :

    > 15.06.2018 18:49:44 : 1280dfd7b1de4c74cacf9515f371844b : jETTY HTTP Server -> servlet with content decompress -> pull from > collections -> CSV to Avro encode -> Kafka publish (RUNNING) > 16.06.2018 02:37:07 : aa7a691fa6c3f1ad619b6c0c4425ba1e : jETTY HTTP Server -> servlet with content decompress -> pull from > collections -> CSV to Avro encode -> Kafka publish (RUNNING) >