Best way to test for existing string against a large list of comparables(针对大量可比对象测试现有字符串的最佳方法)
问题描述
Suppose you have a list of acronym's that define a value (ex. AB1,DE2,CC3) and you need to check a string value (ex. "Happy:DE2|234") to see if an acronym is found in the string. For a short list of acronym's I would usually create a simple RegEx that used a separator (ex. (AB1|DE2|CC3) ) and just look for a match.
But how would I tackle this if there are over 30 acronym's to match against? Would it make sense to use the same technique (ugly) or is there a more effecient and elegant way to accomplish this task?
Keep in mind the example acronym list and example string is not the actual data format that I am working with, rather just a way to express my challenge.
BTW, I read a SO related question but didn't think it applied to what I was trying to accomplish.
EDIT: I forgot to include my need to capture the matched value, hence the choice to use Regular Expressions...
Personally I don't think 30 is particularly large for a regex so I wouldn't be too quick to rule it out. You can create the regex with a single line of code:
var acronyms = new[] { "AB", "BC", "CD", "ZZAB" };
var regex = new Regex(string.Join("|", acronyms), RegexOptions.Compiled);
for (var match = regex.Match("ZZZABCDZZZ"); match.Success; match = match.NextMatch())
Console.WriteLine(match.Value);
// returns AB and CD
So the code is relatively elegant and maintainable. If you know the upper bound for the number of acronyms I would to some testing, who knows what kind of optimizations there are already built into the regex engine. You'll also be able to benefit for free from future regex engine optimizations. Unless you have reason to believe performance will be an issue keep it simple.
On the other hand regex may have other limitations e.g. by default if you have acronyms AB, BC and CD then it'll only return two of these as a match in "ABCD". So its good at telling you there is an acronym but you need to be careful about catching multiple matches.
When performance became an issue for me (> 10,000 items) I put the 'acronyms' in a HashSet and then searched each substring of the text (from min acronym length to max acronym length). This was ok for me because the source text was very short. I'd not heard of it before, but at first look the Aho-Corasick algorithm, referred to in the question you reference, seems like a better general solution to this problem.
这篇关于针对大量可比对象测试现有字符串的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:针对大量可比对象测试现有字符串的最佳方法


- 在哪里可以找到使用中的C#/XML文档注释的好例子? 2022-01-01
- C# 中多线程网络服务器的模式 2022-01-01
- Web Api 中的 Swagger .netcore 3.1,使用 swagger UI 设置日期时间格式 2022-01-01
- WebMatrix WebSecurity PasswordSalt 2022-01-01
- MoreLinq maxBy vs LINQ max + where 2022-01-01
- 如何用自己压缩一个 IEnumerable 2022-01-01
- 输入按键事件处理程序 2022-01-01
- C#MongoDB使用Builders查找派生对象 2022-09-04
- 带有服务/守护程序应用程序的 Microsoft Graph CSharp SDK 和 OneDrive for Business - 配额方面返回 null 2022-01-01
- 良好实践:如何重用 .csproj 和 .sln 文件来为 CI 创建 2022-01-01