代码之家  ›  专栏  ›  技术社区  ›  flash

从shell脚本中表的最后一行提取值[关闭]

  •  0
  • flash  · 技术社区  · 6 年前

    我有一个包含以下内容的文件(data.txt)。它有多行,由 - . 它看起来像一个放在文件中的图形表。在下面的文件中,第一行具有所有列名称,而所有其他行是所有这些列的实际数据。

    Connecting to the ControlService endpoint
    
    Found 3 rows.
    Requests List:
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Client ID                                                                   | Client Type                  | Service Type | Status               | Trust Domain              | Data Instance Name | Data Version | Creation Time              | Last Update                | Scheduled Time | 
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     REFRESH_ROUTINGTIER_ARTIFACTS_1465901168866                              | ROUTINGTIER_ARTIFACTS | SYSTEM       | COMPLETED            | RRA Bulk Client    | soa_server1       | 18.2.2.0.0  | 2016-06-14 03:49:55 -07:00 | 2016-06-14 03:49:57 -07:00 | ---            | 
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     500333443                                                          | CREATE                        | [FA_GSI]     | COMPLETED            | holder       | soa_server1       | 18.3.2.0.0  | 2018-08-07 11:59:57 -07:00 | 2018-08-07 12:04:37 -07:00 | ---            | 
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     500333446                                                          | CREATE                        | [FA_GSI]     | COMPLETED            | holder-test  | soa_server1       | 18.3.2.0.0  | 2018-08-07 12:04:48 -07:00 | 2018-08-07 12:08:52 -07:00 | ---            | 
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    

    现在,我要解析上述文件并从最后一行提取值。我要提取最后一行中“客户端ID”和“信任域”列的值,这是:

    Client ID: 500333446
    Trust Domain: holder-test
    

    这可以在shell脚本、Perl或Python中实现吗?

    4 回复  |  直到 6 年前
        1
  •  0
  •   CodeSamurai777    6 年前

    @paragbaxi的解决方案很好,我只需要添加一个条件来过滤只包含“—”的行。就像这样:

    import csv
    
    lines_to_skip = 4
    with open('data.csv', 'r') as f:
        reader = csv.reader(f, delimiter='|')
        for i in range(lines_to_skip):
            next(reader) #Skipping lines
    
        data = []
        for line in reader:
            if line[0].find("---") != 0:  #Check what position has symbol "---" if 0 then skip
                print(line)
                data.append(line)
    
    
    print("Last row:\n{}".format(data[-1]))
    print("Client ID:{} Domain:{}".format(data[-1][0].replace(" ",""),data[-1][4].replace(" ","")))  #replace() just removes unnecessary spaces
    

    输出:

    [' Client ID                                                                   ', ' Client Type                  ', ' Service Type ', ' Status               ', ' Trust Domain              ', ' Data Instance Name ', ' Data Version ', ' Creation Time              ', ' Last Update                ', ' Scheduled Time ', ' ']
    [' REFRESH_ROUTINGTIER_ARTIFACTS_1465901168866                              ', ' ROUTINGTIER_ARTIFACTS ', ' SYSTEM       ', ' COMPLETED            ', ' RRA Bulk Client    ', ' soa_server1       ', ' 18.2.2.0.0  ', ' 2016-06-14 03:49:55 -07:00 ', ' 2016-06-14 03:49:57 -07:00 ', ' ---            ', ' ']
    [' 500333443                                                          ', ' CREATE                        ', ' [FA_GSI]     ', ' COMPLETED            ', ' holder       ', ' soa_server1       ', ' 18.3.2.0.0  ', ' 2018-08-07 11:59:57 -07:00 ', ' 2018-08-07 12:04:37 -07:00 ', ' ---            ', ' ']
    [' 500333446                                                          ', ' CREATE                        ', ' [FA_GSI]     ', ' COMPLETED            ', ' holder-test  ', ' soa_server1       ', ' 18.3.2.0.0  ', ' 2018-08-07 12:04:48 -07:00 ', ' 2018-08-07 12:08:52 -07:00 ', ' ---            ', ' ']
    Last row:
    [' 500333446                                                          ', ' CREATE                        ', ' [FA_GSI]     ', ' COMPLETED            ', ' holder-test  ', ' soa_server1       ', ' 18.3.2.0.0  ', ' 2018-08-07 12:04:48 -07:00 ', ' 2018-08-07 12:08:52 -07:00 ', ' ---            ', ' ']
    Client ID:500333446 Domain:holder-test
    
    Process finished with exit code 0  
    
        2
  •  0
  •   paragbaxi    6 年前

    是的,可以用Python来实现。我建议使用csv模块并将分隔符自定义为“”。

    import csv
    with open('s', 'r') as f:
      reader = csv.reader(f, delimiter='|')
      for row in reader:
          print(row)
    

    给出以下列表:

    ['Connecting to the ControlService endpoint']
    []
    ['Found 3 rows.']
    ['Requests List:']
    ['-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------']
    [' Client ID                                                                   ', ' Client Type                  ', ' Service Type ', ' Status               ', ' Trust Domain              ', ' Data Instance Name ', ' Data Version ', ' Creation Time              ', ' Last Update                ', ' Scheduled Time ', ' ']
    ['-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------']
    [' REFRESH_ROUTINGTIER_ARTIFACTS_1465901168866                              ', ' ROUTINGTIER_ARTIFACTS ', ' SYSTEM       ', ' COMPLETED            ', ' RRA Bulk Client    ', ' soa_server1       ', ' 18.2.2.0.0  ', ' 2016-06-14 03:49:55 -07:00 ', ' 2016-06-14 03:49:57 -07:00 ', ' ---            ', ' ']
    ['-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------']
    [' 500333443                                                          ', ' CREATE                        ', ' [FA_GSI]     ', ' COMPLETED            ', ' holder       ', ' soa_server1       ', ' 18.3.2.0.0  ', ' 2018-08-07 11:59:57 -07:00 ', ' 2018-08-07 12:04:37 -07:00 ', ' ---            ', ' ']
    ['-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------']
    [' 500333446                                                          ', ' CREATE                        ', ' [FA_GSI]     ', ' COMPLETED            ', ' holder-test  ', ' soa_server1       ', ' 18.3.2.0.0  ', ' 2018-08-07 12:04:48 -07:00 ', ' 2018-08-07 12:08:52 -07:00 ', ' ---            ', ' ']
    ['-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------']
    

    您可以很容易地跳过结果列表中的前4行。

        3
  •  0
  •   G_M    6 年前
    >>> import csv
    ... from itertools import islice
    ... 
    ... with open('infile', 'r') as f:
    ...     reader = csv.DictReader(islice(f, 5, None, 2), delimiter='|')
    ...     result = [
    ...         {k.strip(): v.strip() for k, v in line.items()} for line in reader
    ...     ]
    ... 
    >>> last_row = result[-1]
    >>> import json; print(json.dumps(last_row, indent=2))
    {
      "Client ID": "500333446",
      "Client Type": "CREATE",
      "Service Type": "[FA_GSI]",
      "Status": "COMPLETED",
      "Trust Domain": "holder-test",
      "Data Instance Name": "soa_server1",
      "Data Version": "18.3.2.0.0",
      "Creation Time": "2018-08-07 12:04:48 -07:00",
      "Last Update": "2018-08-07 12:08:52 -07:00",
      "Scheduled Time": "---",
      "": ""
    }
    >>> last_row['Client ID']
    '500333446'
    >>> last_row['Trust Domain']
    'holder-test'
    
        4
  •  0
  •   James Brown    6 年前

    一个在awk:

    awk 'BEGIN{FS="|"}!/^-+/{c=$1;t=$5}END{print "Client ID:" c ORS "Trust Domain:" t}' file
    

    解释:

    $ awk '
    BEGIN { FS="|" }                                # pipe-separator
    !/^-+/ {                                        # process if doesnt start with dashes
        c=$1                                        # client value
        t=$5                                        # trust domain value
    }
    END {                                           # in the end
        print "Client ID:" c ORS "Trust Domain:" t  # output the last value pair
    }' file
    

    输出:

    Client ID: 500333446                                                          
    Trust Domain: holder-test