Lucene and Special Characters(Lucene 和特殊字符)
问题描述
我正在使用 Lucene.Net 2.0 来索引数据库表中的某些字段.其中一个字段是允许特殊字符的名称"字段.当我执行搜索时,它找不到包含带有特殊字符的术语的文档.
I am using Lucene.Net 2.0 to index some fields from a database table. One of the fields is a 'Name' field which allows special characters. When I perform a search, it does not find my document that contains a term with special characters.
我这样索引我的字段:
Directory DALDirectory = FSDirectory.GetDirectory(@"C:IndexesName", false);
Analyzer analyzer = new StandardAnalyzer();
IndexWriter indexWriter = new IndexWriter(DALDirectory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);
Document doc = new Document();
doc.Add(new Field("Name", "Test (Test)", Field.Store.YES, Field.Index.TOKENIZED));
indexWriter.AddDocument(doc);
indexWriter.Optimize();
indexWriter.Close();
我搜索执行以下操作:
value = value.Trim().ToLower();
value = QueryParser.Escape(value);
Query searchQuery = new TermQuery(new Term(field, value));
Searcher searcher = new IndexSearcher(DALDirectory);
TopDocCollector collector = new TopDocCollector(searcher.MaxDoc());
searcher.Search(searchQuery, collector);
ScoreDoc[] hits = collector.TopDocs().scoreDocs;
如果我搜索名称"字段和测试"值,它会找到文档.如果我执行与名称"相同的搜索和与测试(测试)"相同的值,那么它不会找到该文档.
If I perform a search for field as 'Name' and value as 'Test', it finds the document. If I perform the same search as 'Name' and value as 'Test (Test)', then it does not find the document.
更奇怪的是,如果我删除 QueryParser.Escape 行搜索 GUID(当然,其中包含连字符),它会找到 GUID 值匹配的文档,但执行与测试"值相同的搜索(Test)' 仍然没有结果.
Even more strange, if I remove the QueryParser.Escape line do a search for a GUID (which, of course, contains hyphens) it finds documents where the GUID value matches, but performing the same search with the value as 'Test (Test)' still yields no results.
我不确定我做错了什么.我正在使用 QueryParser.Escape 方法来转义特殊字符并存储字段并通过 Lucene.Net 的示例进行搜索.
I am unsure what I am doing wrong. I am using the QueryParser.Escape method to escape the special characters and am storing the field and searching by the Lucene.Net's examples.
有什么想法吗?
推荐答案
StandardAnalyzer 在索引过程中去除特殊字符.您可以传入明确的停用词列表(不包括您想要的停用词).
StandardAnalyzer strips out the special characters during indexing. You can pass in a list of explicit stopwords (excluding the ones you want in).
这篇关于Lucene 和特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:Lucene 和特殊字符


- C#MongoDB使用Builders查找派生对象 2022-09-04
- WebMatrix WebSecurity PasswordSalt 2022-01-01
- 在哪里可以找到使用中的C#/XML文档注释的好例子? 2022-01-01
- 良好实践:如何重用 .csproj 和 .sln 文件来为 CI 创建 2022-01-01
- 如何用自己压缩一个 IEnumerable 2022-01-01
- MoreLinq maxBy vs LINQ max + where 2022-01-01
- 输入按键事件处理程序 2022-01-01
- Web Api 中的 Swagger .netcore 3.1,使用 swagger UI 设置日期时间格式 2022-01-01
- C# 中多线程网络服务器的模式 2022-01-01
- 带有服务/守护程序应用程序的 Microsoft Graph CSharp SDK 和 OneDrive for Business - 配额方面返回 null 2022-01-01