代码之家  ›  专栏  ›  技术社区  ›  osexp2000

logstash->elasticsearch:如何在输出新数据之前删除所有数据

  •  0
  • osexp2000  · 技术社区  · 6 年前

    logstash每周获取多个事件,然后将这些事件转发给elasticsearch,

    如何配置logstash,让它告诉elasticsearch删除旧事件?

    编辑2018-03-28:

    输入:

    {host:"host1", type:"packages", records: [{name:"pkg1", ver: "1"}, {name: "pkg2", ver: "2"},...]
    {host:"host1", type:"mounts", records: [{path:"path1", dev: "dev1"}, {path:"path2", dev: "dev2"},...]
    {host:"host1", type:"???", records: [{???}, {???},...]
    ...
    {host:"host2", type:"packages, records: [{name:"pkg1", version: "1"}, {name: "pkg2", ver: "2"},...]
    {host:"host2", type: "mounts", records: [{path:"path1", dev: "dev1"}, {path:"path2", dev: "dev2"},...]
    {host:"host2", type:"???", records: [{???}, {???},...]
    

    这是每个主机的各种事件。每个事件都由 无法确定 架构。

    为了能够精确地搜索数组中的字段,我必须将数组拆分为多个elasticsearch文档。

    (我知道有很多方法可以不拆分,但可以在数组内搜索。这是另一个故事: Nested Object 。在我的例子中,内部对象不是固定模式,因此我无法预先提供每个内部字段定义)

    输出:

    {host: "host1", type:"packages", record: {name: "pkg1", ver: "1"}}
    {host: "host1", type:"packages", record: {name: "pkg2", ver: "2"}}
    {host: "host1", type:"mounts", record: {path: "path1", dev: "dev1"}}
    {host: "host1", type:"???", record: {???}
    {host: "host1", type:"???", record: {???}
    {host: "host1", type:"mounts", record: {path: "path2", dev: "dev2"}}
    {host: "host2", type:"packages", record: {name: "pkg1", ver: "1"}}
    {host: "host2", type:"packages", record: {name: "pkg2", ver: "2"}}
    {host: "host2", type:"mounts", record: {path: "path1", dev: "dev1"}}
    {host: "host2", type:"mounts", record: {path: "path2", dev: "dev2"}}
    {host: "host2", type:"???", record: {???}
    {host: "host2", type:"???", record: {???}
    ...
    

    日志存储。形态:

    input { ... }
    
    filter {
        split {
          # split array and save them into new multiple events
          field => "records"
        }
        mutate {
          rename => { "records" => "record" }
        }
    }
    
    output {
      elasticsearch {
        hosts => ["ELASTIC_IP:PORT"]
        index => "packages-%{+YYYY.MM.dd}"
      }
    }
    

    -

    问题是:对于每种类型的主机,Elasticsearch将填充越来越多的旧事件。

    因此,我想在获得主机的新数据后删除主机的旧数据。

    注意一些失败的尝试:

    因为输出是多个文档,而不是单个文档,有时更多,有时更少,所以它不是一个简单的更新。它必须是全部删除(&a);添加

    我知道有一些方法可以不拆分,但可以在数组中搜索。这是另一个故事: 嵌套对象 。在我的例子中,内部对象不是固定模式,因此我无法预先提供每个内部字段定义

    1 回复  |  直到 6 年前
        1
  •  0
  •   osexp2000    6 年前

    好吧,I'v证实了ruby过滤器可以删除旧索引。

    input { ... }
    
    filter {
      split {
        # split array and save them into new multiple events
        field => "records"
      }
      mutate {
        rename => { "records" => "record" }
      }
    
      ruby {
        init => "
           require 'net/http'
           require 'uri'
         "
        code => "
          uri = URI.parse('http://docker.for.mac.localhost:19200/inventory-' + event.get('type') + '@' + event.get('host'))
          http = Net::HTTP.new(uri.host, uri.port)
          req = Net::HTTP::Delete.new(uri.request_uri)
          req.basic_auth 'elastic', 'changeme'
          res = http.request(req)
        "
      }
    }
    
    output {
      elasticsearch {
        hosts => ["ELASTIC_IP:PORT"]
        index => "inventory-%{type}@%{host}"
      }
    }
    

    重要的是为主机的每个组合指定索引;类型,以便在删除时轻松定位。

    index => "inventory-%{type}@%{host}"