代码之家 › 专栏 › 技术社区 › Leonard

如何编写看起来不像C的Perl?

coding-style perl

Leonard · 技术社区 · 16 年前

我的同事抱怨我的Perl看起来太像C了,这是很自然的,因为我大多数时候都是用C编程的,而Perl只是一点点。这是我最近的努力。我对Perl很感兴趣,这很容易理解。我有点像Perl评论家,对神秘的Perl几乎没有容忍度。但是考虑到可读性,下面的代码怎么可能更Perlish呢?

它的目标是进行流量分析,找出哪些IP地址在“ip s”文件中给出的范围内。我的努力是:

#!/usr/bin/perl -w

# Process the files named in the arguments, which will contain lists of IP addresses, and see if 
# any of them are in the ranges spelled out in the local file "ip", which has contents of the
# form start-dotted-quad-ip-address,end-dotted-quad-ip_address,stuff_to_be_ignored
use English;


open(IPS,"ips") or die "Can't open 'ips' $OS_ERROR";

# Increment a dotted-quad ip address
# Ignore the fact that part1 could get erroneously large.
sub increment {
    $ip = shift;

    my ($part_1, $part_2, $part_3, $part_4) = split (/\./, $ip);
    $part_4++;
    if ( $part_4 > 255 ) {
        $part_4 = 0;
        ($part_3++);
        if ( $part_3 > 255 ) {
            $part_3 = 0;
            ($part_2++);
            if ( $part_2 > 255 ) {
                $part_2 = 0;
                ($part_1++);
            }
        }
   }   
    return ("$part_1.$part_2.$part_3.$part_4");
}

# Compare two dotted-quad ip addresses.
sub is_less_than {
    $left = shift;
    $right = shift;

    my ($left_part_1, $left_part_2, $left_part_3, $left_part_4)     = split (/\./, $left);
    my ($right_part_1, $right_part_2, $right_part_3, $right_part_4) = split (/\./, $right);


    if  ($left_part_1 != $right_part_1 ) { 
        return ($left_part_1 < $right_part_1);
    }   
    if  ($left_part_2 != $right_part_2 ) { 
        return ($left_part_2 < $right_part_2);
    }   
    if  ($left_part_3 != $right_part_3 ) { 
        return ($left_part_3 < $right_part_3);
    }
    if  ($left_part_4 != $right_part_4 ) {
        return ($left_part_4 < $right_part_4);
    }
    return (false);  # They're equal
}

my %addresses;
# Parse all the ip addresses and record them in a hash.   
while (<IPS>) {
    my ($ip, $end_ip, $junk) = split /,/;
    while (is_less_than($ip, $end_ip) ) {
        $addresses{$ip}=1;
        $ip = increment($ip);
    }
}

# print IP addresses in any of the found ranges

foreach (@ARGV) {
    open(TRAFFIC, $_) or die "Can't open $_ $OS_ERROR";
    while (<TRAFFIC> ) {
        chomp;
        if (defined $addresses{$_}) {
            print "$_\n";
        }
    }
    close (TRAFFIC);

}

15 回复 | 直到 12 年前

Mark Johnson 16 年前

有时,最Perlish的做法是使用CPAN,而不是编写任何代码。

下面是一个使用 Net::CIDR::Lite 和 Net::IP::Match::Regexp :

#!/path/to/perl

use strict;
use warnings;

use English;
use IO::File;
use Net::CIDR::Lite;
use Net::IP::Match::Regexp qw(create_iprange_regexp match_ip);


my $cidr = Net::CIDR::Lite->new();

my $ips_fh = IO::File->new();

$ips_fh->open("ips") or die "Can't open 'ips': $OS_ERROR";

while (my $line = <$ips_fh>) {

    chomp $line;

    my ($start, $end) = split /,/, $line;

    my $range = join('-', $start, $end);

    $cidr->add_range($range);

}

$ips_fh->close();

my $regexp = create_iprange_regexp($cidr->list());

foreach my $traffic_fn (@ARGV) {

    my $traffic_fh = IO::File->new();

    $traffic_fh->open($traffic_fn) or die "Can't open '$traffic_fh': $OS_ERROR";

    while (my $ip_address = <$traffic_fh>) {

        chomp $ip_address;

        if (match_ip($ip_address, $regexp)) {
            print $ip_address, "\n";
        }     

    }

    $traffic_fh->close();

}

免责声明:我刚刚说了出来,它只进行了最少的测试,没有基准测试。健全性检查、错误处理和注释省略以保持行计数。不过,我没有删减空白。

至于代码:在使用函数之前,不需要定义它们。

Kristof Provost 12 年前

通过多年来看到C程序员编写的Perl代码,这里有一些通用的建议:

使用散列。使用列表。使用散列!使用列表!使用列表操作(map、grep、split、join),尤其是对于小循环。不要使用花哨的列表算法;pop、splice、push、shift和unshift更便宜。不要用树,大麻更便宜。哈希是便宜的,制造它们,使用它们,然后扔掉!对循环使用迭代器,而不是3-arg循环。不要称之为$var1、$var2、$var3;而是使用一个列表。不要称之为$var_foo、$var_bar、$var_baz;而是使用哈希。使用 $foo ||= "default" . 不要使用 $_ 如果你要打字的话。

不要使用原型,这是个陷阱!!

使用正则表达式,而不是 substr() 或 index() . 爱回归。使用 /x 使它们可读的修饰符。

写 statement if $foo 当你想要一个无条件的块时。几乎总是有更好的方法来编写嵌套条件:尝试递归、尝试循环、尝试哈希。

在需要时声明变量,而不是在子例程的顶部。使用严格。使用警告,并修复所有警告。使用诊断。编写测试。写吊舱。

使用CPAN。使用CPAN!使用CPAN!可能已经有人做得更好了。

跑 perlcritic . 运行它 --brutal 只是为了踢。跑 perltidy . 想想你为什么要做每件事。改变你的风格。

利用不用花在语言和调试内存分配上的时间来改进代码。

提问。优雅地对代码进行风格注释。去参加一个珠宝商会议。进入perlmunks.org。去yapc或perl研讨会。您的Perl知识将突飞猛进。

ashgromnies 16 年前

大多数编写“Perlish”的代码都将利用Perl中的内置函数。

例如,这:

my ($part_1, $part_2, $part_3, $part_4) = split (/\./, $ip);
$part_4++;
if ( $part_4 > 255 ) {
    $part_4 = 0;
    ($part_3++);
    if ( $part_3 > 255 ) {
        $part_3 = 0;
        ($part_2++);
        if ( $part_2 > 255 ) {
            $part_2 = 0;
            ($part_1++);
        }
    }
}

我会重写如下内容:

my @parts = split (/\./, $ip);

foreach my $part(reverse @parts){
  $part++;
  last unless ($part > 255 && !($part = 0));
}

上面发布的代码就是这样做的,但是有点干净。

您确定代码会满足您的需要吗?就我而言,如果IP后面的部分是>255,那么您只移动到IP的前一部分,这看起来有点奇怪。

Chris Lutz 16 年前

另一个示例重写:

sub is_less_than {
    my $left = shift; # I'm sure you just "forgot" to put the my() here...
    my $right = shift;

    my ($left_part_1, $left_part_2, $left_part_3, $left_part_4)     = split (/\./, $left);
    my ($right_part_1, $right_part_2, $right_part_3, $right_part_4) = split (/\./, $right);


    if  ($left_part_1 != $right_part_1 ) { 
        return ($left_part_1 < $right_part_1);
    }   
    if  ($left_part_2 != $right_part_2 ) { 
        return ($left_part_2 < $right_part_2);
    }   
    if  ($left_part_3 != $right_part_3 ) { 
        return ($left_part_3 < $right_part_3);
    }
    if  ($left_part_4 != $right_part_4 ) {
        return ($left_part_4 < $right_part_4);
    }
    return (false);  # They're equal
}

对此:

sub is_less_than {
    my @left = split(/\./, shift);
    my @right = split(/\./, shift);

    # one way to do it...
    for(0 .. 3) {
        if($left[$_] != $right[$_]) {
            return $left[$_] < $right[$_];
        }
    }

    # another way to do it - let's avoid so much indentation...
    for(0 .. 3) {
        return $left[$_] < $right[$_] if $left[$_] != $right[$_];
    }

    # yet another way to do it - classic Perl unreadable one-liner...
    $left[$_] == $right[$_] or return $left[$_] < $right[$_] for 0 .. 3;

    # just a note - that last one uses the short-circuit logic to condense
    # the if() statement to one line, so the for() can be added on the end.
    # Perl doesn't allow things like do_this() if(cond) for(0 .. 3); You
    # can only postfix one conditional. This is a workaround. Always use
    # 'and' or 'or' in these spots, because they have the lowest precedence.

    return 0 == 1; # false is not a keyword, or a boolean value.
    # though honestly, it wouldn't hurt to just return 0 or "" or undef()
}

此外,在这里:

my ($ip, $end_ip, $junk) = split /,/;

$junk 可能需要 @junk 捕捉 全部的 垃圾, 或您可以不使用它-如果您将一个未知大小的数组分配给一个由两个元素组成的“数组”,它将自动放弃所有额外的东西。所以

my($ip, $end_ip) = split /,/;

这里:

foreach (@ARGV) {
    open(TRAFFIC, $_) or die "Can't open $_ $OS_ERROR";
    while (<TRAFFIC> ) {
        chomp;
        if (defined $addresses{$_}) {
            print "$_\n";
        }
    }
    close (TRAFFIC);
}

而不是 TRAFFIC ,使用变量存储文件句柄。此外,一般来说,您应该使用 exists() 检查哈希元素是否存在,而不是 defined() -它可能存在,但设置为 undef (这不应该发生在你的程序中,但是当程序变得更复杂时,这是一个很好的习惯):

foreach (@ARGV) {
    open(my $traffic, $_) or die "Can't open $_ $OS_ERROR";
    while (<$traffic> ) {
        chomp;
        print "$_\n" if exists $addresses{$_};
    }
    # $traffic goes out of scope, and implicitly closes
}

当然,您也可以使用Perl的 <> 运算符,它打开@argv的每个元素进行读取,并充当一个文件句柄,循环访问这些元素:

while(<>) {
    chomp;
    print "$_\n" if exists $addresses{$_};
}

如前所述,尽量避免 use 惯性导航与制导 English 除非你 use English qw( -no_match_vars ); 为了避免那些邪恶的人受到重大的性能惩罚 match_vars 在那里。而且还没有注意到,但应该是…

总是总是总是总是 use strict; 和 use warnings; 否则拉里·沃尔会从天堂降临,破坏你的密码。我知道你有 -w -这就足够了,因为即使不使用Unix,Perl也会解析shebang行,并且会发现 -W 意志 使用警告; 就像它应该的那样。然而,你需要到 使用严格; . 这将在代码中捕获大量严重错误,例如不使用 my 或使用 false 作为语言关键字。

使您的代码在 strict 以及 warnings 将导致更干净的代码,而这些代码不会因为您无法理解的原因而中断。您将花费数小时在调试器调试上,最终可能会使用 严格的 和 警告 不管怎样,只是想知道错误是什么。只有如果(并且仅当)您的代码是完成你要释放它和它从未生成任何错误。

Brad Gilbert 16 年前

当然,在Perl中这样做是一种方法。

use strict;
use warnings;

my $new_ip;
{
  my @parts = split ('\.', $ip);

  foreach my $part(reverse @parts){
    $part++;

    if( $part > 255 ){
      $part = 0;
      next;
    }else{
      last;
    }
  }
  $new_ip = join '.', reverse @parts;
}

这就是我实际实现它的方法。

use NetAddr::IP;

my $new_ip = ''.(NetAddr::IP->new($ip,0) + 1) or die;

Barry Brown 16 年前

我不能说这个解决方案会使您的程序更加Perl化,但它可能会简化您的算法。

不要将IP地址视为带点的四元、Base-256数字(它需要嵌套的if结构来实现递增函数),而应将IP地址视为32位整数。将A.B.C.D格式的IP转换为整数,并使用此(未测试):

sub ip2int {
    my $ip = shift;
    if ($ip =~ /(\d+)\.(\d+)\.(\d+)\.(\d+)/) {
        return ($1 << 24) + ($2 << 16) + ($3 << 8) + $4;
    } else {
        return undef;
    }
}

现在很容易确定一个IP是否位于两个端点IP之间。只需做简单的整数算术和比较。

$begin = "192.168.5.0";
$end = "192.168.10.255";
$target = "192.168.6.2";
if (ip2int($target) >= ip2int($begin) && ip2int($target) <= ip2int($end)) {
    print "$target is between $begin and $end\n";
} else {
    print "$target is not in range\n";
}

dannysauer 16 年前

告诉你的同事,他们的Perl看起来太像线噪声了。请不要为了混淆而混淆您的代码——这是愚蠢的开发目标,当Perl是真正糟糕的程序员(显然,像您的同事一样)编写草率的代码时,这样的开发目标会让Perl因不可读而声名狼藉。良好的结构、缩进和逻辑代码是一件好事。C是件好事。

不过,说真的,要想知道如何编写Perl,最好的方法是在Damian Conway的O'Reilly“Perl最佳实践”中。它告诉你他认为你应该怎么做,他总是给他的立场很好的理由,偶尔也给你一些不同意的好理由。我在某些方面不同意他的观点,但他的推理是合理的。你和任何比康威先生更了解Perl的人合作的几率都很小,而且有一本印刷书(或者至少有一本Safari订阅书)可以为你的论点提供更坚实的支持。在阅读Perl食谱的同时,您可以找到它的副本,因为查看解决常见问题的代码示例可以帮助您找到正确的方向。我不喜欢说“买书”,但那些书是非常好的书任何 Perl开发人员应该阅读。

关于您的特定代码,您使用的是foreach, $_ 在我看来,它有很多Perl风格——这已经用Perl开发了很长一段时间了。不过,请注意一点——我讨厌英语模块。如果你必须使用它,就要像 use English qw( -no_match_vars ); . match_vars选项可测量地减慢regexp解析,并且 $PREMATCH / $POSTMATCH 它提供的变量通常不有用。

user80168 16 年前

只有一个建议:严格使用。其余的几乎不相关。

Gurunandan Bhat 16 年前

我很清楚你的感受。我的第一种语言是FORTRAN,像一个优秀的FORTRAN程序员一样,我用每种语言编写了FORTRAN。

我有一本非常棒的书 Effective Perl Programming 我不时地重读。尤其是一个叫做“惯用Perl”的章节。下面是一些我用来让Perl看起来像Perl的东西:列出map和grep的操作符、slices和hash slices、引号操作符。

另一件使我的Perl看起来不像Fortran/C的事情是定期读取模块源代码,特别是那些主模块源代码。

Fortyrunner 16 年前

你可以使用 Acme::Bleach 或 Acme::Morse

Brad Gilbert 16 年前

虽然这是可行的:

use strict;
use warnings;
use 5.010;

use NetAddr::IP;

my %addresses;
# Parse all the ip addresses and record them in a hash.
{
  open( my $ips_file, '<', 'ips') or die;

  local $_; # or my $_ on Perl 5.10 or later
  while( my $line = <$ips_file> ){
    my ($ip, $end_ip) = split ',', $line;
    next unless $ip and $end_ip;

    $ip     = NetAddr::IP->new( $ip, 0 ) or die;
    $end_ip = NetAddr::IP->new( $end_ip ) or die;
    while( $ip <= $end_ip ){
      $addresses{$ip->addr} = 1;
      $ip++;
    }
  }
  close $ips_file
}

# print IP addresses in any of the found ranges
use English;

for my $arg (@ARGV) {
  open(my $traffic, '<',$arg) or die "Can't open $arg $OS_ERROR";
  while( my $ip = <$traffic> ){
    chomp $ip;
    if( $addresses{$ip} ){
      say $ip
    }
  }
  close ($traffic);
}

如果可能的话,我会使用网络掩码,因为它会变得更简单:

use Modern::Perl;
use NetAddr::IP;

my @addresses;
{
  open( my $file, '<', 'ips') or die;

  while( (my $ip = <$file>) =~ s(,.*){} ){
    next unless $ip;
    $ip = NetAddr::IP->new( $ip ) or die;
    push @addresses, $ip
  }

  close $file
}


for my $filename (@ARGV) {
  open( my $traffic, '<', $filename )
    or die "Can't open $filename";

  while( my $ip = <$traffic> ) {
    chomp $ip;
    next unless $ip;

    $ip = NetAddr::IP->new($ip) or next; # skip line on error
    my @match;
    for my $cmp ( @addresses ){
      if( $ip->within($cmp) ){
        push @match, $cmp;
        #last;
      }
    }

    say "$ip => @match" if @match;

    say "# no match for $ip" unless @match;
  }
  close ($traffic);
}

试验 ips 文件:

192.168.0.1/24
192.168.0.0
0:0:0:0:0:0:C0A8:0/128

试验 traffic 文件:

192.168.1.0
192.168.0.0
192.168.0.5

输出:

# no match for 192.168.1.0/32
192.168.0.0/32 => 192.168.0.1/24 192.168.0.0/32 0:0:0:0:0:0:C0A8:0/128
192.168.0.5/32 => 192.168.0.1/24

Geo 16 年前

而不是这样做:


if  ($left_part_1 != $right_part_1 ) { 
    return ($left_part_1 < $right_part_1);
}

你可以这样做:


return $left_part_1 < $right_part_1 if($left_part_1 != $right_part_1);

另外,您可以使用 Fatal 模块,以避免检查资料是否有错误。

Tim Cooper 13 年前

“我的代码看起来如何”的唯一标准是阅读和理解目的代码(尤其是不熟悉Perl的程序员),而不是它是否遵循特定的风格。

如果一个Perl语言特性使一些逻辑更容易理解,那么我就使用它,如果不是的话,我就不使用它——即使它可以用更少的代码来实现。

您的同事可能认为我的代码非常“非Perl”,但我敢打赌他们完全理解代码的作用,可以修改它来修复/扩展它,而不会有任何问题:

我的版本:

#******************************************************************************
# Load the allowable ranges into a hash
#******************************************************************************
my %ipRanges = loadIPAddressFile("../conf/ip.cfg");

#*****************************************************************************
# Get the IP to check on the command line
#*****************************************************************************
my ( $in_ip_address ) = @ARGV;

# Convert it to number for comparison
my $ipToCheckNum = 1 * sprintf("%03d%03d%03d%03d", split(/\./, $in_ip_address));

#*****************************************************************************
# Loop through the ranges and see if the number is in any of them
#*****************************************************************************
my $startIp;
my $endIp;
my $msg = "IP [$in_ip_address] is not in range.\n";

foreach $startIp (keys(%ipRanges))
   {
   $endIp = $ipRanges{$startIp};

   if ( $startIp <= $ipToCheckNum and $endIp >= $ipToCheckNum ) 
      {
      $msg = "IP [$in_ip_address] is in range [$startIp] to [$endIp]\n";
      }
   }

print $msg;

#******************************************************************************
# Function: loadIPAddressFile()
#   Author: Ron Savage
#     Date: 04/10/2009
# 
# Description:
# loads the allowable IP address ranges into a hash from the specified file.
# Hash key is the starting value of the range, value is the end of the range.
#******************************************************************************
sub loadIPAddressFile
   {
   my $ipFileHandle;
   my $startIP;
   my $endIP;
   my $startIPnum;
   my $endIPnum;
   my %rangeList;

   #***************************************************************************
   # Get the arguments sent
   #***************************************************************************
   my ( $ipFile ) = @_;

   if ( open($ipFileHandle, "< $ipFile") )
      {
      while (<$ipFileHandle>)
         {
         ( $startIP, $endIP ) = split(/\,/, $_ );

         # Convert them to numbers for comparison
         $startIPnum = 1 * sprintf("%03d%03d%03d%03d", split(/\./, $startIP));
         $endIPnum   = 1 * sprintf("%03d%03d%03d%03d", split(/\./, $endIP));

         $rangeList{$startIPnum} = $endIPnum;
         }

      close($ipFileHandle);
      }
   else
      {
      print "Couldn't open [$ipFile].\n";
      }

   return(%rangeList);
   }

(注意:额外的“”行在那里是为了保留我的怪异间距,当在这里发布代码时,这些行会被破坏)

bubaker 16 年前

我错过什么了吗…上面的任何阵列版本都可以工作吗?mods在for循环的局部变量上执行。我认为布拉德·吉尔伯特的网络:IP解决方案是我的选择。克里斯·卢茨几乎像我一样把剩下的洗干净了。

作为旁白,一些关于可读性的评论让我感到好奇。对于Erlang/Lisp语法的可读性,是否有更少的[强烈的]抱怨,因为它们只有一种方法可以编写代码?

runrig 16 年前

这可能更像C,但也更简单:

use Socket qw(inet_aton inet_ntoa);

my $ip = ("192.156.255.255");

my $ip_1 = inet_ntoa(pack("N", unpack("N", inet_aton($ip))+1));
print "$ip $ip_1\n";

更新:我在阅读问题中的所有代码之前发布了这个。这里的代码只是增加IP地址。