代码之家 › 专栏 › 技术社区 › Anupama G

使用Perl中的散列数组解析CSV文件

hash file-io csv perl

Anupama G · 技术社区 · 9 年前

我的CSV数据格式如下:

Sl.No, Label, Type1, Type2...
1, "label1", Y, N, N...
2, "label2", N, Y, Y...
...

其中“Y”和“N”表示是否将相应标签打印到文件中。

while ( <$fh> ) {    #Reading the CSV file

    $filter = $_;
    chomp $filter;
    $filter =~ tr/\r//d;

    if ( $. == 1 ) {
        @fieldNames = split ",", $filter;
    }
    else {
        @fields = split ",", $filter;
        $numCustomers = scalar(@fields) - 2;
        push @labels, $fields[2];

        for ( $i = 0; $i < $numCustomers; $i++ ) {

            for ( $j = 0; $j < scalar(@labels); $j++ ) {
                $customer[$i][$j] = $fields[ 2 + $i ];
            }

            $custFile = "customer" . $i . "_external.h";

            open( $fh1, ">", $custFile ) or die "Unable to create external header file for customer $i";
        }
    }
}

for ( $i = 0; $i < scalar(@labels); $i++ ) {

    for ( $j = 0; $j < $numCustomers; $j++ ) {

        $Hash{ $fieldNames[ 2 + $i ] }->{ $labels[$i] } = $customer[$j][$i];
        push @aoh, %Hash;    #Array of hashes
    }
}

my @headerLines = read_file($intFile);  # read the internal file, and copy only
                                        # those lines that are not marked with
                                        # "N" in the CSV file to the external file.

# iterate over elements of each hash and print the labels only if value is 'Y'

foreach my $headerLine (@headerLines) {

    chomp $headerLine;

    for $i ( 0 .. $#aoh ) {

        for my $cust1 ( sort keys %{ $aoh[$i] } ) {    #HERE

            for my $reqLabel1 ( keys %{ $aoh[$i]{$cust1} } ) {

                print "$cust1, $reqLabel1 : $aoh[$i]{$cust1}{$reqLabel1}\n";

                if ( $aoh[$i]{$cust1}{$reqLabel1} eq "Y" ) {

                    for ( $j = 0; $j < $numCustomers; $j++ ) {
                        $req[$j][$i] = $reqLabel1;
                    }
                }
                else {
                    for ( $j = 0; $j < $numCustomers; $j++ ) {
                        $nreq[$j][$i] = $reqLabel1;
                    }
                }
            }

        }

        if ( grep { $headerLine =~ /$_/ } @nreq ) {
            next;    #Don't print this line in the external file
        }
        else {
            print $fh1 $headerLine . "\n";    #print this line in the external file
        }
    }
}

这抱怨“无法将字符串Type1用作哈希REF”,指的是标记为#HERE的行。

我试过到处转储数据结构,但我不确定这是从哪里产生的。

任何见解都将不胜感激。

我收到了使用 Text::CSV 这将是一个更好的解决方案。它将如何减少使用嵌套数据结构的需要?

2 回复 | 直到 9 年前

Sobrique 9 年前

好吧,你的问题变得容易多了 Text::CSV 。我建议你看一看改写,或者重新考虑你的问题。

但你的问题实际上是:

push @aoh, %Hash;                #Array of hashes

这根本不会创建哈希数组。从中提取所有元素 %Hash (除了配对的键和值之外,没有特定的顺序),并将它们插入 @aoh .

你可能想要:

push @aoh, \%Hash;

或者:

push @aoh, { %Hash };

我不完全清楚,因为你在重复使用 %哈希(Hash) ,所以可能会出现重复。最好由以下人员处理 use strict; use warnings; 并正确定义散列的词汇范围。

choroba 9 年前

我只需要保留一个打开的文件句柄数组(如果没有太多类型),并在逐行读取文件时打印到它们。

#!/usr/bin/perl
use warnings;
use strict;

chomp( my $header = <> );
my @names = split /, /, $header;

my @handles;
for my $type (@names[ 2 .. $#names ]) {
    open my $fh, '>', $type or die "$type: $!";
    push @handles, $fh;
}

while (<>) {
    chomp;
    my @fields = split /, /;
    for my $index (0 .. $#handles) {
        print { $handles[$index] } $fields[1], "\n" if 'Y' eq $fields[ $index + 2 ];
    }
}

我使用以下输入进行测试:

Sl.No, Label, Type1, Type2, Type3, Type4
1, "label1", Y, N, Y, N
2, "label2", N, Y, Y, N

如果您的输入包含 \r 线路末端,设置 binmode 到 :crlf .