代码之家  ›  专栏  ›  技术社区  ›  Vignesh T.V.

在elasticsearch聚合中为每个bucket获取一个文档

  •  0
  • Vignesh T.V.  · 技术社区  · 6 年前

    {
      "took": 5,
      "timed_out": false,
      "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": 1261,
        "max_score": 0,
        "hits": []
      },
      "aggregations": {
        "clusters": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 1073,
          "buckets": [
            {
              "key": 813058,
              "doc_count": 46
            },
            {
              "key": 220217,
              "doc_count": 29
            },
            {
              "key": 287763,
              "doc_count": 23
            },
            {
              "key": 527217,
              "doc_count": 20
            },
            {
              "key": 881778,
              "doc_count": 15
            },
            {
              "key": 700725,
              "doc_count": 14
            },
            {
              "key": 757602,
              "doc_count": 13
            },
            {
              "key": 467496,
              "doc_count": 10
            },
            {
              "key": 128318,
              "doc_count": 9
            },
            {
              "key": 317261,
              "doc_count": 9
            }
          ]
        }
      }
    }
    

    我想为聚合中的每个bucket获取一个文档(按最高分数或随机-任何都可以)。我该怎么做?

    我用于获取聚合的查询如下:

    GET myindex/_search
    {
        "size": 0,
        "aggs": {
            "clusters": {
                "terms": {
                    "field": "myfield",
                    "size": 100000
                }
            }
        },
        "query": {
                    "bool": {
                        "must": [
                            {
                                "query_string": { "default_field": "field1", "query": "val1" }
                            },
                            {
                                "query_string": { "default_field": "field2", "query": "val2" }
                            }
                        ]
                    }
                }
    }
    

    我试图实现一个基于聚类的句子相似度系统,因此我需要这个。我从每一组中挑选一个句子,并检查与给定句子的相似性。

    1 回复  |  直到 6 年前
        1
  •  2
  •   Vignesh T.V.    6 年前

    我可以通过使用这里给出的热门点击聚合来解决这个问题: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html

    下面的示例查询:

    GET myindex/_search
    {
        "size": 0,
        "aggs": {
            "clusters": {
                "terms": {
                    "field": "myfield",
                    "size": 100000
                },
            "aggs": {
                "mydoc": {
                    "top_hits": {
                        "size" : 1
                    }
                }
            }
            }
        },
        "query": {
                    "bool": {
                        "must": [
                            {
                                "query_string": { "default_field": "field1", "query": "val1" }
                            },
                            {
                                "query_string": { "default_field": "field2", "query": "val2" }
                            }
                        ]
                    }
                }
    }