执行摘要:
preg_replace()
比字符串比较运行得快。为什么?正则表达式不应该慢一点吗?
在一个
recent question
关于检测给定输入中任何不允许的子字符串数组,我建议比较
PrggRePosit()
调用原始输入,因为
PrggRePosit()
可以将模式数组作为输入。因此,我的方法可能是
if
而其他解决方案需要一个(或多个)循环。
我不想讨论我的答案,因为实际上它比循环的可读性/可维护性差。我的答案仍然是-1,为了可读性/易于维护,我会接受这一点,但我的方法指出的最大错误是缺乏效率。这让我很好奇,并让我做了一些测试。我的结果让我有点惊讶:其他因素都是一样的,
PrggRePosit()
是
更快
比任何其他方法都好。
你能解释一下为什么会这样吗?
以下是我的这些测试代码以及结果:
$input = "In a recent question about detecting any of an array of disallowed substrings within a given input, I suggested comparing the result of a `preg_replace()` call to the original input, since `preg_replace()` can take an array of patterns as input. Thus my method for this could be a single `if` whereas the other solutions required one (or many) loops. I'm not interested in debating my answer, because really it is less readable/maintainable than the loops. However, the biggest fault pointed out with my method was a lack of efficiency. That got me curious, and led me to do some testing. My results were a bit surprising to me: with all other factors held equal, `preg_replace()` was **faster** than any of the other methods. Can you explain why this was the case?";
$input2 = "Short sentence - no matches";
$input3 = "Word";
$input4 = "Short sentence - matches loop";
$start1 = microtime(true);
$rejectedStrs = array("loop", "efficiency", "explain");
$p_matches = 0;
for ($i = 0; $i < 10000; $i++) {
if (str_check($rejectedStrs, $input)) $p_matches++;
if (str_check($rejectedStrs, $input2)) $p_matches++;
if (str_check($rejectedStrs, $input3)) $p_matches++;
if (str_check($rejectedStrs, $input4)) $p_matches++;
}
$start2 = microtime(true);
$rejectedStrs = array("loop", "efficiency", "explain");
$l_matches = 0;
for ($i = 0; $i < 10000; $i++) {
if (loop_check($rejectedStrs, $input)) $l_matches++;
if (loop_check($rejectedStrs, $input2)) $l_matches++;
if (loop_check($rejectedStrs, $input3)) $l_matches++;
if (loop_check($rejectedStrs, $input4)) $l_matches++;
}
$start3 = microtime(true);
$rejectedStrs = array("/loop/", "/efficiency/", "/explain/");
$s_matches = 0;
for ($i = 0; $i < 10000; $i++) {
if (preg_check($rejectedStrs, $input)) $s_matches++;
if (preg_check($rejectedStrs, $input2)) $s_matches++;
if (preg_check($rejectedStrs, $input3)) $s_matches++;
if (preg_check($rejectedStrs, $input4)) $s_matches++;
}
$end = microtime(true);
echo $p_matches." ".$l_matches." ".$s_matches."\n";
echo "str_match: ".$start1." ".$start2."= ".($start2-$start1)."\nloop_match: ".$start2." ".$start3."=".($start3-$start2)."\npreg_match: ".$start3." ".$end."=".($end-$start3);
function preg_check($rejectedStrs, $input) {
if($input == preg_replace($rejectedStrs, "", $input))
return true;
return false;
}
function loop_check($badwords, $string) {
foreach (str_word_count($string, 1) as $word) {
foreach ($badwords as $bw) {
if (stripos($word, $bw) === 0) {
return false;
}
}
}
return true;
}
function str_check($badwords, $str) {
foreach ($badwords as $word) {
if (stripos(" $str ", " $word ") !== false) {
return false;
}
}
return true;
}
结果
20000万20000
str_匹配:1282270516.6934 1282270518.5881=1.894730091095
回路匹配:1282270518.5881 1282270523.0943=4.506185700348
Preg_匹配:1282270523.0943 1282270523.6191=0.52475500106812