代码之家 › 专栏 › 技术社区 › Simon

在ruby中哪一个更快-哈希查找还是带有case语句的函数?

performance ruby

Simon · 技术社区 · 14 年前

在时间关键型脚本中,我们有几个地方可以将旧的id转换为字符串。现在,我们在函数中使用case语句,如下所示:

def get_name id
  case id
    when 1
      "one thing"
    when 3
      "other thing"
    else
      "default thing"
  end
end

我正在考虑用散列查找替换它,比如:

NAMES = {
  1 => "one thing",
  3 => "other thing",
}
NAMES.default = "default thing"

感觉用起来应该快一点 NAMES[id] 比 get_name(id) -但是是吗?

7 回复 | 直到 14 年前

Bill Dueber 14 年前

先说几点。一个是,像这样的低级语言结构,或多或少地做同样的事情,几乎从来不是任何实际应用程序的瓶颈,所以(通常)关注它们是愚蠢的。第二,正如前面已经提到的,如果你真的关心它,你应该基准它。Ruby的基准测试和概要工具当然不是编程生态系统中最先进的,但是它们完成了任务。

我的直觉是散列将更快,因为(我猜,又一次)case语句必须依次检查每个条件(使查找项O(n)而不是O(1))。但是让我们检查一下!

完整的基准测试代码位于 https://gist.github.com/25 基本上,它生成一个文件,定义适当的case/hash,然后使用它们。我继续把散列查找也放在方法调用中,这样开销就不会成为一个因素,但在现实生活中,没有理由把它卡在方法中。

这是我得到的。每种情况下,我都要做10000次查找。时间是用户时间(秒)

Case statement, 10 items  0.020000
Hash lookup, 10 items     0.010000

Case statement, 100 items  0.100000
Hash lookup, 100 items     0.010000

Case statement, 1000 items  0.990000
Hash lookup, 1000 items     0.010000

所以,它看起来非常像case语句是O(n)(这里没有震动)。还要注意,即使在case语句中,10K查找也仅仅是一秒钟,因此除非您正在执行这些查找的度量butload,否则最好将注意力集中在其余代码上。

MartinStettner 14 年前

因为这取决于许多因素(要转换多少不同的id,编译器编译 case when 我的建议是: 测量一下 :

编写一个小的测试例程并将10.000.000 id转换为字符串。对两个实现都执行几次并比较结果。如果你没有显著的差异,可以选择你更喜欢的(我认为,散列解决方案更优雅一些…)

zetetic 14 年前

$ ruby -v
ruby 1.9.2p0 (2010-08-18 revision 29036) [i686-linux]

$ cat hash_vs_case.rb 
require 'benchmark'

def get_from_case(id)
  case id
    when 1
      "one thing"
    when 3
      "other thing"
    else
      "default thing"
  end
end

NAMES = {
  1 => "one thing",
  3 => "other thing",
}
NAMES.default = "default thing"

def get_from_hash(arg)
  NAMES[arg]
end

n = 1000000
Benchmark.bm do |test|
  test.report("case  1") { n.times do; get_from_case(1); end }
  test.report("hash  1") { n.times do; get_from_hash(1); end}
  test.report("case 42") { n.times do; get_from_case(42); end }
  test.report("hash 42") { n.times do; get_from_hash(42); end}
end

$ ruby -w hash_vs_case.rb 
      user     system      total        real
case  1  0.330000   0.000000   0.330000 (  0.422209)
hash  1  0.220000   0.000000   0.220000 (  0.271300)
case 42  0.310000   0.000000   0.310000 (  0.390295)
hash 42  0.320000   0.010000   0.330000 (  0.402647)

下面是您想要升级的原因:

$ ruby -v
ruby 1.8.7 (2009-06-12 patchlevel 174) [i486-linux]

$ ruby -w hash_vs_case.rb 
      user     system      total        real
case  1  1.380000   0.870000   2.250000 (  2.738493)
hash  1  1.320000   0.850000   2.170000 (  2.642013)
case 42  1.500000   0.960000   2.460000 (  3.029923)
hash 42  1.890000   0.890000   2.780000 (  3.456969)

Amala 13 年前

下面是一个示例,它显示了符号查找的大小写速度。前面的例子是基于整数的键。

https://gist.github.com/02c8f8ca0cd4c9d9ceb2

Ben Lee 14 年前

为什么不使用 case 语句内联在代码的时间关键部分,而不是让它自己调用函数?这避免了调用堆栈的开销,也避免了散列查找的开销。

你也可以自己做一个基准。做这样的事:

t = Time.now
1_000_000.times { ...your code... }
secs = Time.now - t

用每个选项替换“你的代码”。

RKh 14 年前

正如马丁所说,这取决于你想检查多少身份证。您是从数据库中选择ID和相应的字符串,还是要硬编码它。如果只有几张支票,那你就可以结案了。但是如果需要,没有修改ID/String对的选项。

另一方面,如果要从数据库中提取大量id,可以使用Dictionary之类的工具来存储名称/值对。

当字典在内存中时,查找可能很快。

但归根结底,这完全取决于您是使用不断变化的ID/string对,还是只使用少数常量。

-1

Nakilon earlonrails 14 年前

case when 有 n个复杂性。
hash lookup 有 ln(n) 复杂性,但是使用附加对象(哈希)和调用其方法有其自身的惩罚。

所以,如果有很多情况(成千上万,数百万,…),哈希查找会更好。但在你的情况下,当你只有3个变种时, 案件发生时 当然会快得多。