代码之家  ›  专栏  ›  技术社区  ›  Chase

Perl正则表达式语法

  •  0
  • Chase  · 技术社区  · 14 年前

    对于那些熟悉Perl和regex的人来说,这可能是一个非常简单的任务,但是我有点结结巴巴。

    我为这个Perl脚本列出的步骤如下:

    1. 找到合适的SPSS文件块(regex)进行进一步的处理和格式化
    2. 将R语法返回到命令行,最好是返回一个文件。

    ...A bunch of nonsense I do not care about...
    ...
     Value Labels
    /gender
    1 "M"
    2 "F"
    /purpose
    1 "business"
    2 "vacation"
    3 "tiddlywinks"
    
    execute . 
    ...Resume nonsense...
    

    gender <- as.factor(gender
        , levels= c(1,2)
        , labels= c("M","F")
        )
    ...
    

    这是我迄今为止编写的Perl脚本。我已经成功地将每一行读入适当的数组。我有最终print函数所需的一般流程,但是我需要弄清楚如何只为每个@vars数组打印适当的@levels和@labels数组。

    #!/usr/bin/perl
    
    #Need to change to read from argument in command line
    open(VARVAL, "append.txt");
    @lines = <VARVAL>;
    close(VARVAL);
    
    #Read through each line and put into a variable, a value, or a reject
    #I really only want to read in everything between "value labels" and "execute ."
    #That probably requires more regex...
    foreach  (@lines){
        if ($_ =~ /\//){        #Anything with a / is a variable, remove the / and push
            $_ =~ tr/\///d;
            push(@vars, $_)
        } elsif ($_ =~/\d/) {
            push(@vals, $_)    #Anything that has a number in the line is a value
            }
    }
    #Splitting each @vals array into levels or labels arrays
    foreach (@vals){
        @values = split(/\s+/, $_); #Splitting on a space, vunerable...better to split on first non digit character?
        foreach (@values) {
            if ($_ =~/\d/){
                push(@levels, $_);
            } else {
                push(@labels, $_)
            }
        }
    }
    
    #Get rid of newline
    #I should provavly do this somewhere else?
    chomp(@vars);
    chomp(@levels);
    chomp(@labels);
    
    #Need to tell it when to stop adding in @levels & @labels. While loop? Hash lookup?
    #Need to get rid of final comma
    #Need to redirect output to a file
    foreach (@vars){
        print $_ ." <- as.factor(" . $_ . "\n\t, levels = c(" ;
             foreach (@levels){
                print $_ . ",";
             }
        print ")\n\t, labels = c(";
        foreach(@labels){
                print $_ . ",";
            }
        print ")\n\t)\n";
    }
    

    最后,这里是脚本当前运行时的示例输出:

    gender <- as.factor(gender
        , levels = c(1,2,1,2,3,)
        , labels = c("M","F","biz","action","tiddlywinks",)
        )
    

    1 回复  |  直到 14 年前
        1
  •  2
  •   mfontani    14 年前

    #!/usr/bin/env perl
    use strict;
    use warnings;
    
    my @lines = <DATA>;
    
    my $current_label = '';
    my @ordered_labels;
    my %data;
    for my $line (@lines) {
        if ( $line =~ /^\/(.*)$/ ) { # starts with slash
            $current_label = $1;
            push @ordered_labels, $current_label;
            next;
        }
        if ( length $current_label ) {
            if ( $line =~ /^(\d) "(.*)"$/ ) {
                $data{$current_label}{$1} = $2;
                next;
            }
        }
    }
    
    for my $label ( @ordered_labels ) {
        print "$label <- as.factor($label\n";
        print "    , levels= c(";
        print join(',',map { $_ } sort keys %{$data{$label}} );
        print ")\n";
        print "    , labels= c(";
        print join(',',
            map { '"' . $data{$label}{$_} . '"'  }
            sort keys %{$data{$label}} );
        print ")\n";
        print "    )\n";
    }
    
    __DATA__
    ...A bunch of nonsense I do not care about...
    ...
     Value Labels
    /gender
    1 "M"
    2 "F"
    /purpose
    1 "business"
    2 "vacation"
    3 "tiddlywinks"
    
    execute . 
    

    产量:

    gender <- as.factor(gender
        , levels= c(1,2)
        , labels= c("M","F")
        )
    purpose <- as.factor(purpose
        , levels= c(1,2,3)
        , labels= c("business","vacation","tiddlywinks")
        )