代码之家 › 专栏 › 技术社区 › GSto

为什么这个正则表达式有3个匹配项,而不是5个?

regex php

GSto · 技术社区 · 14 年前

我用PHP编写了一个非常简单的preg\u match\u all文件:

$fileName = 'A_DATED_FILE_091410.txt';
$matches = array();
preg_match_all('/[0-9][0-9]/',$fileName,$matches);
print_r($matches);

我的预期产出:

$matches = array(
    [0] => array(
        [0] => 09,
        [1] => 91,
        [2] => 14,
        [3] => 41,
        [4] => 10
    )
)

我得到的是:

$matches = array(
    [0] => array(
        [0] => 09,
        [1] => 14,
        [2] => 10
    )
)

现在,在这个特定的用例中,这是更可取的,但我想知道为什么它不匹配其他子字符串?另外,正则表达式是否可以给我预期的输出,如果可以,它是什么?

4 回复 | 直到 14 年前

Daniel Vandersluis 14 年前

使用全局正则表达式(这是什么 preg_match_all 使用),一旦匹配完成,正则表达式引擎将继续从上一个匹配的末尾搜索字符串。

在您的例子中,正则表达式引擎从字符串的开头开始,一直前进到 0 ,因为这是第一个匹配的字符 [0-9] 9 [0-9] 09 作为匹配。当引擎继续匹配时(因为它还没有到达字符串的末尾),它会再次前进(到) 1 )(然后重复上述步骤)。

First Look at How a Regex Engine Works Internally

如果你必须 preg_match 并使用偏移量确定从何处开始捕获:

$fileName = 'A_DATED_FILE_091410.txt';
$allSequences = array();
$matches = array();
$offset = 0;

while (preg_match('/[0-9][0-9]/', $fileName, $matches, PREG_OFFSET_CAPTURE, $offset))
{
  list($match, $offset) = $matches[0];
  $allSequences[] = $match;
  $offset++; // since the match is 2 digits, we'll start the next match after the first
}

PREG_OFFSET_CAPTURE 旗子是

我有另一个解决方案,它可以获得五个匹配,而不必使用偏移量,但我在这里添加它只是出于好奇,我可能不会在生产代码中使用它(它也是一个有点复杂的regex)。您可以使用使用 lookbehind 寻找数字之前当前位置,以及 captures

(?<=([0-9]))[0-9]

让我们看看这个正则表达式:

(?<=       # open a positive lookbehind
  (        # open a capturing group
    [0-9]  # match 0-9
  )        # close the capturing group
)          # close the lookbehind
[0-9]      # match 0-9

因为lookarounds是零宽度的,并且不移动regex位置,所以这个正则表达式将匹配5次:引擎将前进,直到 9 (因为这是满足lookback断言的第一个位置)。自 9 9 作为匹配(但是因为我们在lookaround中捕获,所以它也将捕获 !). 然后发动机移动到 1 1 作为第一个子组匹配添加(以此类推,直到引擎到达字符串的末尾)。

,我们将得到一个如下的数组(使用 PREG_SET_ORDER 将捕获组与完全匹配一起分组的标志):

Array
(
    [0] => Array
        (
            [0] => 9
            [1] => 0
        )

    [1] => Array
        (
            [0] => 1
            [1] => 9
        )

    [2] => Array
        (
            [0] => 4
            [1] => 1
        )

    [3] => Array
        (
            [0] => 1
            [1] => 4
        )

    [4] => Array
        (
            [0] => 0
            [1] => 1
        )

)

请注意,每个“匹配”的数字顺序不对!这是因为lookback中的capture组变为backreference 1,而整个匹配是backreference 0。不过,我们可以按正确的顺序重新组合:

preg_match_all('/(?<=([0-9]))[0-9]/', $fileName, $matches, PREG_SET_ORDER);
$allSequences = array();
foreach ($matches as $match)
{
  $allSequences[] = $match[1] . $match[0];
}

Gumbo 14 年前

09 是匹配的 091410 1410 .

Matthew 14 年前

另外,正则表达式是否可能给我期望的结果,如果是的话, 它是什么?

$i = 0;
while (preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, $i))
{
  $i = $matches[0][1]; /* + 1 in many cases */
}

[0][1] ,而是类似于 [1][1] 等等,同样,取决于模式。

对于这种特殊的情况,我认为自己动手要简单得多:

$l = strlen($s);
$prev_digit = false;
for ($i = 0; $i < $l; ++$i)
{
  if ($s[$i] >= '0' && $s[$i] <= '9')
  {
    if ($prev_digit) { /* found match */ }
    $prev_digit = true;
  }
  else
    $prev_digit = false;
}

Julien Roncaglia 14 年前

只是为了好玩,另一种方法是:

 <?php
 $fileName = 'A_DATED_FILE_091410.txt';
 $matches = array();
 preg_match_all('/(?<=([0-9]))[0-9]/',$fileName,$matches);
 $result = array();
 foreach($matches[1] as $i => $behind)
 {
     $result[] = $behind . $matches[0][$i];
 }
 print_r($result);
 ?>