代码之家  ›  专栏  ›  技术社区  ›  mohit

查询分区athena表中的aws alb日志时出现问题

  •  1
  • mohit  · 技术社区  · 6 年前

    表创建查询:

    CREATE EXTERNAL TABLE IF NOT EXISTS {{DATABASE_NAME.TABLE_NAME}} (
      type string,
      time string,
      elb string,
      client_ip string,
      client_port string,
      target string,
      request_processing_time int,
      target_processing_time int,
      response_processing_time int,
      elb_status_code int,
      target_status_code string,
      received_bytes int,
      sent_bytes int,
      request_verb string,
      request_url string,
      request_proto string,
      user_agent string,
      ssl_cipher string,
      ssl_protocol string,
      target_group_arn string,
      trace_id string
    )
    PARTITIONED BY(year string, month string, day string) 
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
    WITH SERDEPROPERTIES (
      'serialization.format' = '1',
      'input.regex' = '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*) ([-0-9]*) ([-0-9]*) ([-0-9]*) ([-0-9]*) ([^ ]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) ([^ ]*)\" \"([^\"]*)\" ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*)'
    ) LOCATION 's3://{{BUCKET_NAME}}/AWSLogs/{{ACCOUNT_ID}}/elasticloadbalancing/us-west-2/';
    

    分区查询:

    ALTER TABLE alb_webapp add partition (year="2018", month="*", day="*")
    location "s3://{{bucket-name}}/{{directory-name}}/AWSLogs/{{account-id}}/elasticloadbalancing/us-east-1/2018/09/";
    

    当我试图运行一个简单的查询时,说“select”就是给我找到的结果为零。

    1 回复  |  直到 6 年前
        1
  •  4
  •   mohit    6 年前

    AWS似乎再次改变了alb日志的格式。AWS文档有新的regex来使用新的日志格式。下面的问题对我有用。

    CREATE EXTERNAL TABLE IF NOT EXISTS webapp_alb (
        type string,
        time string,
        elb string,
        client_ip string,
        client_port int,
        target_ip string,
        target_port int,
        request_processing_time double,
        target_processing_time double,
        response_processing_time double,
        elb_status_code string,
        target_status_code string,
        received_bytes bigint,
        sent_bytes bigint,
        request_verb string,
        request_url string,
        request_proto string,
        user_agent string,
        ssl_cipher string,
        ssl_protocol string,
        target_group_arn string,
        trace_id string,
        domain_name string,
        chosen_cert_arn string,
        matched_rule_priority string,
        request_creation_time string,
        actions_executed string,
        redirect_url string,
        new_field string
        )
        PARTITIONED BY(year string, month string, day string) 
        ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
        WITH SERDEPROPERTIES (
        'serialization.format' = '1',
        'input.regex' = 
    '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\"($| \"[^ ]*\")(.*)')
        LOCATION 's3://{{bucket-name}}/{{directory-name}}/AWSLogs/{{account-id}}/elasticloadbalancing/us-east-1/';
    

    添加分区:

    ALTER TABLE webapp_alb add partition (year="2018", month="*", day="*") location "s3://{{bucket-name}}/{{directory-name}}/AWSLogs/{{account-id}}/elasticloadbalancing/us-east-1/2018/09/";
    

    裁判:

    https://docs.aws.amazon.com/athena/latest/ug/application-load-balancer-logs.html https://docs.aws.amazon.com/athena/latest/ug/partitions.html