代码之家 › 专栏 › 技术社区 › Matt Warren

用于向字典添加项的Linq方法

foreach performance linq .net

Matt Warren · 技术社区 · 15 年前

我正在尝试通过实现PeterNorvig的 spelling corrector 在C语言中。

第一部分涉及 file of words (大约100万)把它放进字典 key 是这个词和 value 是出现的次数。

我通常会这样做:

foreach (var word in allWords)                                                    
{           
    if (wordCount.ContainsKey(word))
        wordCount[word]++;
    else
        wordCount.Add(word, 1);
}

在哪里? allWords 是一个 IEnumerable<string>

在Linq,我现在这样做:

var wordCountLINQ = (from word in allWordsLINQ
                         group word by word
                         into groups
                         select groups).ToDictionary(g => g.Key, g => g.Count());

我比较了这两本字典 <key, value> 它们是相同的,所以产生了相同的结果。

这个 foreach 循环采取 3.82秒 Linq查询需要 4.49秒

我正在使用秒表类计时,并在释放模式下运行。我不认为表演不好,我只是想知道是否有原因造成这种差异。

我是以一种低效的方式进行LINQ查询,还是遗漏了一些东西?

更新: 以下是完整的基准代码示例:

public static void TestCode()
{
    //File can be downloaded from http://norvig.com/big.txt and consists of about a million words.
    const string fileName = @"path_to_file";
    var allWords = from Match m in Regex.Matches(File.ReadAllText(fileName).ToLower(), "[a-z]+", RegexOptions.Compiled)
                   select m.Value;

    var wordCount = new Dictionary<string, int>();
    var timer = new Stopwatch();            
    timer.Start();
    foreach (var word in allWords)                                                    
    {           
        if (wordCount.ContainsKey(word))
            wordCount[word]++;
        else
            wordCount.Add(word, 1);
    }
    timer.Stop();

    Console.WriteLine("foreach loop took {0:0.00} ms ({1:0.00} secs)\n",
            timer.ElapsedMilliseconds, timer.ElapsedMilliseconds / 1000.0);

    //Make LINQ use a different Enumerable (with the exactly the same values), 
    //if you don't it suddenly becomes way faster, which I assmume is a caching thing??
    var allWordsLINQ = from Match m in Regex.Matches(File.ReadAllText(fileName).ToLower(), "[a-z]+", RegexOptions.Compiled)
                   select m.Value;

    timer.Reset();
    timer.Start();
    var wordCountLINQ = (from word in allWordsLINQ
                            group word by word
                            into groups
                            select groups).ToDictionary(g => g.Key, g => g.Count());  
    timer.Stop();

    Console.WriteLine("LINQ took {0:0.00} ms ({1:0.00} secs)\n",
            timer.ElapsedMilliseconds, timer.ElapsedMilliseconds / 1000.0);                     
}

4 回复 | 直到 7 年前

disambiguator Ruben 12 年前

LINQ版本变慢的原因之一是,创建了两个字典而不是一个字典:

(内部)来自group by操作符;group by还存储每个单独的单词。您可以通过查看toarray()而不是count()来验证这一点。这是你实际不需要的大量开销。
toDictionary方法基本上是对实际Linq查询的foreach,查询结果将添加到新字典中。根据唯一单词的数量,这也需要一些时间。

Linq查询速度较慢的另一个原因是,Linq依赖于lambda表达式(达坦答案中的委托),调用委托与内联代码相比,会增加少量开销。

编辑: 请注意,对于某些LINQ方案(如LINQ to SQL,但不在内存中,如此处),重写查询将生成一个更优化的计划:

from word in allWordsLINQ 
group word by word into groups 
select new { Word = groups.Key, Count = groups.Count() }

但是请注意,这并不是给你一本字典,而是一系列单词及其计数。你可以用

(from word in allWordsLINQ 
 group word by word into groups 
 select new { Word = groups.Key, Count = groups.Count() })
.ToDictionary(g => g.Word, g => g.Count);

Dan Atkinson 12 年前

当我构建您的第二个示例,然后在Reflector的反汇编视图中打开它时,我得到以下信息:

Dictionary<string, int> wordCountLINQ = allWordsLINQ.GroupBy<string, string>(delegate (string word) {
    return word;
}).Select<IGrouping<string, string>, IGrouping<string, string>>(delegate (IGrouping<string, string> groups) {
    return groups;
}).ToDictionary<IGrouping<string, string>, string, int>(delegate (IGrouping<string, string> g) {
    return g.Key;
}, delegate (IGrouping<string, string> g) {
    return g.Count<string>();
});

可能这需要更长的时间,因为有更多的函数调用正在发生,并且经过了一百万个迭代的累积过程。

NetMage 7 年前

通过完全滥用LINQ,我能够让它保持一致,并且通常比foreach循环快一点,即使使用委托呼叫:

var wordCountLINQ = allWordsLINQ.Aggregate(new Dictionary<string, int>(), (wcld, w) => { wcld[w] = (wcld.ContainsKey(w) ? wcld[w] : 0) + 1; return wcld; })

甚至改变 foreach 使用类似的集合表达式并不能使其更快。

Romano Zumbé Ajay Jain 7 年前

可以使用lambda表达式解决问题:

var words = unitOfWork.DepartmentRepository.Get()
           .GroupBy(a=>a.word).Select(s    => new 
           {
             Word = s.Key,
             Count = s.Count()
           }).ToDictionary(d=>d.Word, d=>d.Count);