代码之家 › 专栏 › 技术社区 › Lasit Pant

弹性搜索不能给出准确的结果

elastic-stack elasticsearch mongodb python-3.x python

Lasit Pant · 技术社区 · 7 年前

我正在使用匹配短语查询在ES中查找。但我注意到返回的结果不合适。代码--

      res = es.search(index=('indice_1'),

               body = {
    "_source":["content"],

    "query": {
        "match_phrase":{
        "content":"xyz abc"
        }}}

   ,
size=500,
scroll='60s')

它不能让我记录内容所在的位置- “嗨,我叫XYZ abc”和“嗨,是XYZ abc”。“生活如何”

2 回复 | 直到 7 年前

Tim 7 年前

如果没有指定分析器,则使用 standard 默认情况下。它将进行基于语法的标记化。所以你对短语“嗨,我的名字是XYZ abc”的称呼是 [hi, my, name, isxyz, abc] 和 match_phrase [xyz, abc] 相邻(除非您指定 slop ).

您可以使用其他分析器,也可以修改查询。如果你使用 match 查询时,它将匹配术语“abc”。如果希望短语匹配,则需要使用不同的分析器。 NGrams

举个例子:

PUT test_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 3,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    }
  }, 
  "mappings": {
    "_doc": {
      "properties": {
        "content": {
          "type": "text",
          "analyzer": "my_analyzer"
        }
      }
    }
  }
}

PUT test_index/_doc/1
{
  "content": "hi my name isxyz abc."
}

PUT test_index/_doc/2
{
  "content": "hey wassupxyz abc. how is life"
}

POST test_index/_doc/_search
{
  "query": {
    "match_phrase": {
      "content": "xyz abc"
    }
  }
}

结果找到了两份文件。

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "test_index",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.5753642,
        "_source": {
          "content": "hey wassupxyz abc. how is life"
        }
      },
      {
        "_index": "test_index",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.5753642,
        "_source": {
          "content": "hi my name isxyz abc."
        }
      }
    ]
  }
}

编辑: 如果你想做一个 wildcard 查询时,可以使用 标准 分析仪。您在注释中指定的用例将如下添加:

PUT test_index/_doc/3
{
  "content": "RegionLasit Pant0Q00B000001KBQ1SAO00"
}

POST test_index/_doc/_search
{
  "query": {
    "wildcard": {
      "content.keyword": {
        "value": "*Lasit Pant*"
      }
    }
  }
}

基本上,您是在不使用 nGram 分析仪。您的查询短语将是 "*<my search terms>*" nGrams .

Pratik Patel 7 年前

 res = es.search(index=('indice_1'),

               body = {
    "_source":["content"],

    "query": {
        "query":"xyz abc"
        },
        type:"phrase"}

   ,
size=500,
scroll='60s')

推荐文章

ÎÎÎ½Î· ÎÎ®Î¹Î½Î¿Ï · Python lxml.html语法错误:使用lxml find时XPATH的谓词无效

10 月前

Cam · Pandas列表日期到日期时间

10 月前

RASEL MAHMUD · 为什么以及如何在is_even()函数内的IF条件中递归X变量在满足0后递增?[副本]

11 月前

jjkennedy · Pandas文本文件导入:当每个文件中存在多个表时,自动选择1个表

11 月前

LMC · Numpy数组布尔索引以获取包含元素

11 月前

vr8ce · 非成对标记中特定字符的正则表达式

1 年前

Kernel · 如果指定了crs参数,shapefile的geopandas.read_file将出错

1 年前

ShaAnder · 为什么sqllachemy返回的是类而不是字符串

1 年前

sixtytrees · detectron2软件包未安装(没有名为“torch”的模块),但我安装了torch

1 年前

Pernoctador · Python映射可以复制吗?我需要参考地图

1 年前