代码之家  ›  专栏  ›  技术社区  ›  ZyX

如何使用perl将非latin1数据发布到非UTF8站点?

  •  2
  • ZyX  · 技术社区  · 14 年前

    我想使用LWP::UserAgent在CP1251站点上发布俄语文本,并获得以下结果:

    # $text="Русский текст"; obtained from command line
    FIELD_NAME => $text                                # result: Г?в г'В?г'В?г'В?г?вєг?вёг?в? Г'В'Г?вчг?вєг'В?г'В'
    $text=Encode::decode_utf8($text);
    FIELD_NAME => $text                                # result: Р с?с?с?рєрёр? С'Рчрєс?с'
    FIELD_NAME => Encode::encode("cp1251", $text)     # result: Г?гіг+г+гЄгёгЏ ГІгҐгЄг+гІ
    FIELD_NAME => URI::Escape::uri_escape_utf8($text) # result: D0%a0%d1%83%d1%81%d1%81%d0%ba%d0%b8%d0%b9%20%d1%82%d0%b5%d0%ba%d1%81%d1%82
    

    我该怎么做?内容类型必须为 x-www-form-urlencoded . 你可以找到类似的形式 here ,但您可以使用&#…;对任何非拉丁字符进行转义。。。;试图从中逃脱 FIELD_NAME 结果 10561091108910891 10901077108210891 (每 & # ; 从绳子上剥下来)或 1056;усский текст 字段名称

    更新 :有人知道如何转换以下代码,以便使用LWP::UserAgent::post函数吗?

    my $url=shift;
    my $fields=shift;
    my $request=HTTP::Request->new(POST => absURL($url));
    $request->content_type('application/x-www-form-urlencoded');
    $request->content_encoding("UTF-8");
    $ua->prepare_request($request);
    my $content="";
    for my $k (keys %$fields) {
        $content.="&" if($content ne "");
        my $c=$fields->{$k};
        eval {$c=Encode::decode_utf8($c)};
        $c=Encode::encode("cp1251", $c, Encode::FB_HTMLCREF);
        $content.="$k=".URI::Escape::uri_escape($c);
    }
    $request->content($content);
    my $response=$ua->simple_request($request);
    

    这段代码实际上解决了这个问题,但我不想添加第三个请求包装函数(与 get post

    3 回复  |  直到 14 年前
        1
  •  0
  •   DVK    14 年前

    一种解决方法似乎是(我认为远不是最好的)使用recode系统命令,如果你有它的aviable。从 http://const.deribin.com/files/SignChanger.pl.txt

    my $boardEncoding="cp1251"; # encoding used by the board
    $vals{'Post'} = `fortune $forunePath | recode utf8..$boardEncoding`;
    $res = $ua->post($formURL,\%vals);
    

    另一种方法似乎是 http://mail2lj.nichego.net/lj.txt

    my        $formdata = $1 ;
    my        $hr = ljcomment_string2form($formdata) ;
    my        $req = new HTTP::Request('POST' => $ljcomment_action)
            or die "new HTTP::Request(): $!\n" ;
    
    $hr->{usertype} = 'user' ;
    $hr->{encoding} = $mh->mime_attr('content-type.charset') ||
                      "cp1251" ;
    $hr->{subject}  = decode_mimewords($mh->get('Subject'));
    $hr->{body} = $me->bodyhandle->as_string() ;
    
    $req->content_type('application/x-www-form-urlencoded');
    $req->content(href2string($hr)) ;
    
    my      $ljres = submit_request($req, "comment") ;
    
    if ($ljres->{'success'} eq "OK") {
        print STDERR "journal updated successfully\n" ;
    } else {
        print STDERR "error updating journal: $ljres->{errmsg}\n" ;
        send_bounce($ljres->{errmsg}, $me, $mh->mime_attr("content-type.charset")) ;
    }
    
        2
  •  0
  •   daxim Fayland Lam    14 年前

    使用 WWW::Mechanize ,它会自动处理编码(字符编码和表单编码),如果 form 元素的 accept-charset UTF-8 因此需要纠正。你好像遇到这种情况了。顺便说一下,示例站点的编码是KOI8-R,而不是Windows-1251。工作示例:

    use utf8;
    use WWW::Mechanize qw();
    my $message = 'Русский текст';
    my $mech = WWW::Mechanize->new(
        cookie_jar => {},
        agent => 'Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/533.9 SUSE/6.0.401.0-2.1 (KHTML, like Gecko)',
    );
    $mech->get('http://zhurnal.lib.ru/cgi-bin/comment?COMMENT=/z/zyx/index_4-1');
    $mech->current_form->accept_charset(scalar $mech->response->content_type_charset);
    $mech->submit_form(with_fields => { TEXT => $message });
    

    HTTP转储(仅限重要部分):

    POST /cgi-bin/comment HTTP/1.1
    Content-Length: 115
    Content-Type: application/x-www-form-urlencoded
    
    FILE=%2Fz%2Fzyx%2Findex_4-1&MSGID=&OPERATION=store_new&NAME=&EMAIL=&URL=&TEXT=%F2%D5%D3%D3%CB%C9%CA+%D4%C5%CB%D3%D
    
        3
  •  -1
  •   ZyX    14 年前

    这些函数解决了这个问题(第一个用于发布应用程序/x-www-form-urlencoded数据,第二个用于多部分/表单数据):

    #{{{2 postue
    sub postue($$;$) {
        my $url=shift;
        my $fields=shift;
        my $referer=shift;
        if(defined $referer and $referer eq "" and defined $fields->{"DIR"}) {
            $referer=absURL($url."?DIR=".$fields->{"DIR"}); }
        else {
            $referer=absURL($referer); }
        my $request=HTTP::Request->new(POST => absURL($url));
        $request->content_type('application/x-www-form-urlencoded');
        $request->content_encoding("UTF-8");
        $ua->prepare_request($request);
        my $content="";
        for my $k (keys %$fields) {
            $content.="&" if($content ne "");
            my $c=$fields->{$k};
            if(not ref $c) {
                $c=Encode::decode_utf8($c) unless Encode::is_utf8($c);
                $c=Encode::encode("cp1251", $c, Encode::FB_HTMLCREF);
                $c=URI::Escape::uri_escape($c);
            }
            elsif(ref $c eq "URI::URL") {
                $c=$c->canonical();
                $c=URI::Escape::uri_escape($c);
            }
            $content.="$k=$c";
        }
        $request->content($content);
        $request->referer($referer) if(defined $referer);
        my $i=0;
        print STDERR "Doing POST request to url $url".
            (($::o_verbose>2)?(" with fields:\n".
                    ::YAML::dump($fields)):("\n"))
            if($::o_verbose>1);
      REQUEST:
        my $response=$ua->simple_request($request);
        $i++;
        my $code=$response->code;
        if($i<=$o_maxtries and 500<=$code and $code<600) {
            print STDERR "Failed to request $url with code $code... retrying\n"
                if($::o_verbose>2);
            sleep $o_retryafter;
            goto REQUEST;
        }
        return $response;
    }
    #{{{2 postfd
    sub postfd($$;$) {
        my $url=absURL(shift);
        my $content=shift;
        my $referer=shift;
        $referer=absURL($referer) if(defined $referer);
        my $i=0;
        print STDERR "Doing POST request (form-data) to url $url".
            (($::o_verbose>2)?(" with fields:\n".
                    ::YAML::dump($content)):("\n"))
            if($::o_verbose>1);
        my $newcontent=[];
        while(my ($f, $c)=splice @$content, 0, 2) {
            if(not ref $c) {
                $c=Encode::decode_utf8($c) unless Encode::is_utf8($c);
                $c=Encode::encode("cp1251", $c, Encode::FB_HTMLCREF);
            }
            push @$newcontent, $f, $c;
        }
      POST:
        my $response=$ua->post($url, $newcontent,
                               Content_type => "form-data",
                               ((defined $referer)?(referer => $referer):()));
        $i++;
        my $code=$response->code;
        if($i<=$o_maxtries and 500<=$code and $code<600) {
            print STDERR "Failed to download $url with code $code... retrying\n"
                if($::o_verbose>2);
            sleep $o_retryafter;
            goto POST;
        }
        return $response;
    }