代码之家  ›  专栏  ›  技术社区  ›  Zaid

如何提取Perl正则表达式中第n个匹配项?

  •  1
  • Zaid  · 技术社区  · 14 年前

    是否可以提取 n 在一串单引号中匹配?

    use strict;
    use warnings;
    
    my $string1 = "'I want to' 'extract the word' 'Perl','from this string'";
    my $string2 = "'What about','getting','Perl','from','here','?'";
    
    sub extract_quoted { 
    
        my ($string, $index) = @_;
        my ($wanted) = $string =~ /some_regex_using _$index/;
        return $wanted;
    }
    
    extract_wanted ($string1, 3); # Should return 'Perl', with quotes
    extract_wanted ($string2, 3); # Should return 'Perl', with quotes
    
    7 回复  |  直到 14 年前
        1
  •  1
  •   Robert P    14 年前

    你可以试试:

    sub extract_quoted {
    
            my ($string, $index) = @_;
            while($string =~ /'(.*?)'/g) {
                    $index--;
                    return $1 if(! $index); # return $1 if index became 0. 
            }
            return; # not found - returns undef or () depending on context.
    }
    
        2
  •  2
  •   Community Mr_and_Mrs_D    7 年前

    请看这个问题: How do I save matched parts of a regex in Perl? this answer (g开关就是诀窍)

        3
  •  2
  •   msw    14 年前

    这个 match-g operator evaluated in an array context 生成一个匹配数组。因此:

    @matches = $string =~ /'(.*?)'/g;
    $matches[$index-1];
    

    是得到你想要的东西的一种方法。

        4
  •  2
  •   Hasturkun    14 年前

    这应该管用:

    sub extract_quoted { 
    
        my ($string, $index) = @_;
        my $wanted = ($string =~ /'(.*?)'/g)[$index];
        return $wanted;
    }
    
        5
  •  0
  •   Sinan Ünür    14 年前
    use strict; use warnings;
    
    use Text::ParseWords;
    
    my $string1 = q{'I want to' 'extract the word' 'Perl','from this string'};
    my $string2 = q{'What about', 'getting','Perl','from','here','?'};
    
    print extract_wanted($_, 3), "\n" for ($string1, $string2);
    
    sub extract_wanted {
        my ($string, $index) = @_;
        my $wanted = (parse_line '(:?\s|,)+', 0, $string)[$index - 1];
        return unless defined $wanted;
        return $wanted;
    }
    

    输出:

    Perl
    Perl
        6
  •  -1
  •   Greg Bacon    14 年前

    看起来很难看,但是

    my $quoted = qr/'[^']+'/;  # ' fix Stackoverflow highlighting
    my %_extract_wanted_cache;
    sub extract_wanted_memo { 
      my($string, $index) = @_;
      $string =~ ($_extract_wanted_cache{$index} ||=
                    qr/^(?:.*?$quoted.*?){@{[$index-1]}}($quoted)/)
        ? $1
        : ();
    }
    

    基准测试表明可能值得:

    sub extract_wanted { 
      my($string, $index) = @_;
      $string =~ /^(?:.*?$quoted.*?){@{[$index-1]}}($quoted)/
        ? $1
        : ();
    }
    
    sub extract_wanted_gindex {
      my($string, $index) = @_;
      ($string =~ /$quoted/g)[$index-1];
    }
    
    use Benchmark;
    timethese -1 => {
      nocache => sub { extract_wanted        $string2, 3 },
      memoize => sub { extract_wanted_memo   $string2, 3 },
      index   => sub { extract_wanted_gindex $string2, 3 },
    
      nocache_fail => sub { extract_wanted        $string2, 100 },
      memoize_fail => sub { extract_wanted_memo   $string2, 100 },
      index_fail   => sub { extract_wanted_gindex $string2, 100 },
    }
    

    结果:

    Benchmark: 
    running
     index, index_fail, memoize, memoize_fail, nocache, nocache_fail
     for at least 1 CPU seconds
    ...
    
         index:   1 w/c secs (1.04 usr + 0.00 sys = 1.04 CPU) @183794.23/s (n=191146)
    index_fail:   1 w/c secs (1.03 usr + 0.00 sys = 1.03 CPU) @185578.64/s (n=191146)
       memoize:   1 w/c secs (1.00 usr + 0.00 sys = 1.00 CPU) @264664.00/s (n=264664)
    memoize_fail: 0 w/c secs (1.03 usr + 0.00 sys = 1.03 CPU) @835106.80/s (n=860160)
       nocache:   0 w/c secs (1.03 usr + 0.00 sys = 1.03 CPU) @196495.15/s (n=202390)
    nocache_fail: 2 w/c secs (1.03 usr + 0.00 sys = 1.03 CPU) @445390.29/s (n=458752)
        7
  •  -1
  •   ghostdog74    14 年前
    sub extract_quoted {
            my ($string,$index) = @_;
            $c=1;
            @s = split /\'/,$string;
            for($i=1;$i<=$#s;$i+=2){
                $c==$index && return $s[$i];
                $c++;
            }
    }