代码之家  ›  专栏  ›  技术社区  ›  Steve Townsend

最小化LINQ字符串令牌计数器

  •  1
  • Steve Townsend  · 技术社区  · 14 年前

    后续回答 an earlier question .

    有没有办法进一步减少这种情况,避免 String.Split {token, count} .

    string src = "for each character in the string, take the rest of the " +
        "string starting from that character " +
        "as a substring; count it if it starts with the target string";
    
    string[] target = src.Split(new char[] { ' ' });
    
    var results = target.GroupBy(t => new
    {
        str = t,
        count = target.Count(sub => sub.Equals(t))
    });
    
    4 回复  |  直到 7 年前
        1
  •  4
  •   Jeff Mercado    14 年前

    正如你现在所拥有的,它会起作用(在某种程度上),但效率非常低。因此,结果是一个分组的枚举,而不是您可能正在考虑的(单词,计数)对。

    超负荷的 GroupBy()

    string src = "for each character in the string, take the rest of the " +
                 "string starting from that character " +
                 "as a substring; count it if it starts with the target string";
    
    var results = src.Split()               // default split by whitespace
                     .GroupBy(str => str)   // group words by the value
                     .Select(g => new
                                  {
                                      str = g.Key,      // the value
                                      count = g.Count() // the count of that value
                                  });
    
    // sort the results by the words that were counted
    var sortedResults = results.OrderByDescending(p => p.str);
    
        2
  •  3
  •   spender    14 年前

    虽然速度慢了3-4倍,但Regex方法可以说更精确:

    string src = "for each character in the string, take the rest of the " +
        "string starting from that character " +
        "as a substring; count it if it starts with the target string";
    
    var regex=new Regex(@"\w+",RegexOptions.Compiled);
    var sw=new Stopwatch();
    
    for (int i = 0; i < 100000; i++)
    {
        var dic=regex
            .Matches(src)
            .Cast<Match>()
            .Select(m=>m.Value)
            .GroupBy(s=>s)
            .ToDictionary(g=>g.Key,g=>g.Count());
        if(i==1000)sw.Start();
    }
    Console.WriteLine(sw.Elapsed);
    
    sw.Reset();
    
    for (int i = 0; i < 100000; i++)
    {
        var dic=src
            .Split(' ')
            .GroupBy(s=>s)
            .ToDictionary(g=>g.Key,g=>g.Count());
        if(i==1000)sw.Start();
    }
    Console.WriteLine(sw.Elapsed);
    

    string string, 作为两个独立的条目,并将正确标记 substring substring; .

    编辑

        3
  •  1
  •   dahlbyk    14 年前

    这是一个没有 ToDictionary() ,这可能会根据您的需要增加不必要的开销。。。

    var dic = src.Split(' ').GroupBy(s => s, (str, g) => new { str, count = g.Count() });
    

    var dic = from str in src.Split(' ')
              group str by str into g
              select new { str, count = g.Count() };
    
        4
  •  1
  •   Community CDub    7 年前

    摆脱 String.Split Regex.Matches 作为 spender demonstrated ,另一个是 Regex.Split (这没有给我们任何新的东西)。

    var target = src.Split(new[] { ' ', ',', ';' }, StringSplitOptions.RemoveEmptyEntries);
    var result = target.Distinct()
                       .Select(s => new { Word = s, Count = target.Count(w => w == s) });
    
    // or dictionary approach
    var result = target.Distinct()
                       .ToDictionary(s => s, s => target.Count(w => w == s));
    

    Distinct 需要调用以避免重复项。我继续扩展字符,把它们分开,得到没有标点符号的实际单词。我发现第一种方法是使用spender的基准测试代码最快的。

    var result = target.Distinct()
                       .Select(s => new { Word = s, Count = target.Count(w => w == s) })
                       .OrderByDescending(o => o.Count);
    
    // or in query form
    
    var result = from s in target.Distinct()
                 let count = target.Count(w => w == s)
                 orderby count descending
                 select new { Word = s, Count = count };
    

    编辑:删除了元组,因为匿名类型很接近。