Lucene Proximity Search for phrase with more than two words(Lucene Proximity 搜索超过两个词的短语)
问题描述
Lucene 的手册中已经清楚地解释了邻近搜索的含义,其中包含两个单词,例如 "jakarta apache"~10
中的示例http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Proximity Searches
Lucene's manual has explained the meaning of proximity search for a phrase with two words clearly, such as the "jakarta apache"~10
example in
http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Proximity Searches
但是,我想知道像 "jakarta apache lucene"~10
这样的搜索到底是做什么的?它是否允许相邻的单词最多相隔 10 个单词,或者所有成对的单词都是这样?
However, I am wondering what does a search like "jakarta apache lucene"~10
exactly do? Does it allow neighboring words to be at most 10 words apart, or all pairs of words to be that?
谢谢!
推荐答案
slop (proximity) 就像编辑距离一样工作(参见 PhraseQuery.setSlop
).因此,这些条款可以重新排序或添加额外的条款.这意味着接近度将是添加到整个查询中的最大术语数.那就是:
The slop (proximity) works like an edit distance (see PhraseQuery.setSlop
). So, the terms could be reordered or have extra terms added. This means that the proximity would be the maximum number of terms added into the whole query. That is:
"jakarta apache lucene"~3
将匹配:
- jakarta lucene apache"(距离:2)
- "jakarta extra words here apache lucene"(距离:3)
- jakarta 一些词 apache 分隔 lucene"(距离:3)
但不是:
- lucene jakarta apache"(距离:4)
- "jakarta too many extra words here apache lucene"(距离:5)
- jakarta 一些话apache进一步分隔lucene"(距离:4)
有些人被以下的困惑:
lucene jakarta apache"(距离:4)
"lucene jakarta apache" (distance: 4)
简单的解释是交换术语需要两次编辑,所以:
The simple explanation is that swapping terms takes two edits, so:
- jakarta apache lucene(距离:0)
- jakarta lucene apache(第一次交换,距离:2)
- lucene jakarta apache(第二次交换,距离:4)
更长但更准确的解释是,每次编辑都允许将术语移动一个位置.交换的第一步将两个术语相互交换.牢记这一点解释了为什么任何三个术语的集合都可以重新排列成距离不大于 4 的任何顺序.
The longer, but more accurate, explanation is that every edit allows a term to be moved by one position. The first move of a swap transposes two terms on top of each other. Keeping this in mind explains why any set of three terms can be rearranged into any order with distance no greater than 4.
- jakarta apache lucene(距离:0)
- jakarta [apache,lucene](距离:1)
- [jakarta,apache,lucene](都转置在同一个位置,距离:2)
- lucene [jakarta,apache](距离:3)
- lucene jakarta apache(距离:4)
这篇关于Lucene Proximity 搜索超过两个词的短语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:Lucene Proximity 搜索超过两个词的短语


- 将log4j 1.2配置转换为log4j 2配置 2022-01-01
- Eclipse 插件更新错误日志在哪里? 2022-01-01
- 从 finally 块返回时 Java 的奇怪行为 2022-01-01
- Jersey REST 客户端:发布多部分数据 2022-01-01
- Java包名称中单词分隔符的约定是什么? 2022-01-01
- Spring Boot连接到使用仲裁器运行的MongoDB副本集 2022-01-01
- Safepoint+stats 日志,输出 JDK12 中没有 vmop 操作 2022-01-01
- value & 是什么意思?0xff 在 Java 中做什么? 2022-01-01
- 如何使用WebFilter实现授权头检查 2022-01-01
- C++ 和 Java 进程之间的共享内存 2022-01-01