代码之家 › 专栏 › 技术社区 › Steve Townsend

最小化LINQ字符串令牌计数器

string linq .net c#

Steve Townsend · 技术社区 · 14 年前

有没有办法进一步减少这种情况,避免 String.Split {token, count} .

string src = "for each character in the string, take the rest of the " +
    "string starting from that character " +
    "as a substring; count it if it starts with the target string";

string[] target = src.Split(new char[] { ' ' });

var results = target.GroupBy(t => new
{
    str = t,
    count = target.Count(sub => sub.Equals(t))
});

4 回复 | 直到 7 年前

Jeff Mercado 14 年前

正如你现在所拥有的,它会起作用(在某种程度上),但效率非常低。因此,结果是一个分组的枚举,而不是您可能正在考虑的(单词,计数)对。

超负荷的 GroupBy()

string src = "for each character in the string, take the rest of the " +
             "string starting from that character " +
             "as a substring; count it if it starts with the target string";

var results = src.Split()               // default split by whitespace
                 .GroupBy(str => str)   // group words by the value
                 .Select(g => new
                              {
                                  str = g.Key,      // the value
                                  count = g.Count() // the count of that value
                              });

// sort the results by the words that were counted
var sortedResults = results.OrderByDescending(p => p.str);

spender 14 年前

虽然速度慢了3-4倍,但Regex方法可以说更精确:

string src = "for each character in the string, take the rest of the " +
    "string starting from that character " +
    "as a substring; count it if it starts with the target string";

var regex=new Regex(@"\w+",RegexOptions.Compiled);
var sw=new Stopwatch();

for (int i = 0; i < 100000; i++)
{
    var dic=regex
        .Matches(src)
        .Cast<Match>()
        .Select(m=>m.Value)
        .GroupBy(s=>s)
        .ToDictionary(g=>g.Key,g=>g.Count());
    if(i==1000)sw.Start();
}
Console.WriteLine(sw.Elapsed);

sw.Reset();

for (int i = 0; i < 100000; i++)
{
    var dic=src
        .Split(' ')
        .GroupBy(s=>s)
        .ToDictionary(g=>g.Key,g=>g.Count());
    if(i==1000)sw.Start();
}
Console.WriteLine(sw.Elapsed);

string 和 string, 作为两个独立的条目,并将正确标记 substring substring; .

编辑

dahlbyk 14 年前

这是一个没有 ToDictionary() ,这可能会根据您的需要增加不必要的开销。。。

var dic = src.Split(' ').GroupBy(s => s, (str, g) => new { str, count = g.Count() });

var dic = from str in src.Split(' ')
          group str by str into g
          select new { str, count = g.Count() };

Community CDub 7 年前

摆脱 String.Split Regex.Matches 作为 spender demonstrated ,另一个是 Regex.Split (这没有给我们任何新的东西)。

var target = src.Split(new[] { ' ', ',', ';' }, StringSplitOptions.RemoveEmptyEntries);
var result = target.Distinct()
                   .Select(s => new { Word = s, Count = target.Count(w => w == s) });

// or dictionary approach
var result = target.Distinct()
                   .ToDictionary(s => s, s => target.Count(w => w == s));

Distinct 需要调用以避免重复项。我继续扩展字符,把它们分开,得到没有标点符号的实际单词。我发现第一种方法是使用spender的基准测试代码最快的。

var result = target.Distinct()
                   .Select(s => new { Word = s, Count = target.Count(w => w == s) })
                   .OrderByDescending(o => o.Count);

// or in query form

var result = from s in target.Distinct()
             let count = target.Count(w => w == s)
             orderby count descending
             select new { Word = s, Count = count };

编辑:删除了元组,因为匿名类型很接近。