代码之家  ›  专栏  ›  技术社区  ›  sasikumar karuppiah

Awk脚本提取多个不同的分隔符行

awk
  •  0
  • sasikumar karuppiah  · 技术社区  · 2 年前

    日志文件:

    2022年4月25日02:55:08062;1234567808123456789,新增soc:[DSPSIA2D9450,USGPRSPPF,0]删除soc:[]ldap soc:[DSPSIA2D9450,OPTSRA1H7,52,USGPRSPPF,0]数据库soc:[OPTSRA1H7,52]

    2022年4月25日02:55:08872;98765432101234567833,新增soc:[DSPSIA2EB,450,USGPRSPPF,0]删除soc:[DSPSIA2CZ,450]ldap soc:[BBSUSPEND,0,DSPSIA2EB,450,OPTSRA1H7,52,USGPRSPPF,0,BBSUSPEND,0,USGPRSPPF,0]db soc:[BBSUSPEND,0,BBSUSPENSPEND,0,DSPSIA2CZ,450,OPTSRA1H7,52,USGPRSPPF,0]

    2022年4月25日02:55:09413;23456789022123456789,新增soc:[DSPSIA2D6450]删除soc:[DSPSIA0R6450]ldap soc:[BBSUSPEND,0,DSPSIA2D6450,OPTSRA1H7,52,USGPRSPPF,0,BBSUSPEND,0]db soc:[BBSUSPEND,0,BBSUSPEND,0,DSPSIA0R6450,OPTSRA1H7,52,USGPRSPPF,0]

    如果“新增soc”包含“USGPRSPPF”,则提取第六列值。

    输出:

    12345678908

    98765432101

    1 回复  |  直到 2 年前
        1
  •  1
  •   Daweo    2 年前

    我会使用GNU AWK 为了完成下面的任务,让我们 file.txt 知足

    25 Apr 2022 02:55:08,062 ; 12345678908,123456789, added soc:[DSPSIA2D9,450, USGPRSPPF,0] deleted soc:[] ldap soc:[DSPSIA2D9,450, OPTSRA1H7,52, USGPRSPPF,0] db SOC:[OPTSRA1H7,52]
    
    25 Apr 2022 02:55:08,872 ; 98765432101,234567833, added soc:[DSPSIA2EB,450, USGPRSPPF,0] deleted soc:[DSPSIA2CZ,450] ldap soc:[BBSUSPEND,0, DSPSIA2EB,450, OPTSRA1H7,52, USGPRSPPF,0, BBSUSPEND,0, USGPRSPPF,0] db SOC:[BBSUSPEND,0, BBSUSPEND,0, DSPSIA2CZ,450, OPTSRA1H7,52, USGPRSPPF,0]
    
    25 Apr 2022 02:55:09,413 ; 23456789022,123456789, added soc:[DSPSIA2D6,450] deleted soc:[DSPSIA0R6,450] ldap soc:[BBSUSPEND,0, DSPSIA2D6,450, OPTSRA1H7,52, USGPRSPPF,0, BBSUSPEND,0] db SOC:[BBSUSPEND,0, BBSUSPEND,0, DSPSIA0R6,450, OPTSRA1H7,52, USGPRSPPF,0]
    

    然后

    awk 'BEGIN{FS="[[:space:]]+|,"}/added soc:\[[^\]]*USGPRSPPF/{print $7}' file.txt
    

    给出输出

    12345678908
    98765432101
    

    说明:我通知GNU 那个磁场分离器( FS )是一个还是多个( + )空格字符或( | ) , .然后对于包含 added soc:[ 后跟零或多个非- ] 然后 USGPRSPPF print 第七场。注意字面意义 [ ] 需要转义,因为它们在正则表达式中具有特殊意义。

    (在gawk 4.2.1中测试)