代码之家  ›  专栏  ›  技术社区  ›  Navdeep Singh

什么是Python3等价于perl中的useragent和http库?

  •  0
  • Navdeep Singh  · 技术社区  · 5 年前

    我对Perl和Python还很陌生。我必须将一些用Perl创建的旧函数转换成Python。我正在努力寻找与python类似的模块 -{ua}->简单请求()等。

    我已经看过像beauthulsoup这样的模块,这些模块可以方便地解析html页面中的数据。

    perl中的代码初始化如下:

    sub new {
        my ($class, %args) = @_;
        $ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0;
        my $self = { # default args:
    #                 ip        => '10.10.10.10',
                    port        => 443,
            transparent => 0,
    #       logger      => 
            user_agent  => "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36",
    #       user_agent  => "mybrowser",
            ssl_ver         => '23',
                    %args,
                   };
    
        unlink "cookies.txt";
        $self->{ua} = LWP::UserAgent->new(keep_alive => 10);
        $self->{ua}->agent($self->{user_agent});
        Net::SSL::send_useragent_to_proxy(1);
        $self->{ua}->timeout(90 * 1);
    #   $self->{ua}->timeout(200 * 1);
        $ENV{'HTTPS_VERSION'} = $self->{ssl_ver};
        my $cookie_jar = HTTP::Cookies->new(
            file        => "cookies.txt",
            hide_cookie2    => 1,
    #           autosave    => 1,
        );
    
        $self->{ua}->cookie_jar($cookie_jar);
    
        # Set proxy
        if (! $self->{transparent}) {
            my $proxy = 'http://' . $self->{ip} . ':' . $self->{port};  # don't add .'/' !
            $self->{logger}->Log("Set UA proxy: $proxy", 4);
            $self->{ua}->proxy('http', $proxy);
            $self->{ua}->proxy('https', $proxy);
    #       $ua->proxy('https', $proxy);    # break authentication
            $ENV{'HTTPS_PROXY'} = $proxy;
            $self->{logger}->Log("Set HTTPS proxy: $ENV{'HTTPS_PROXY'}", 4);
            $self->{proxy} = $proxy;
        }
    
    =head
        my $context = new IO::Socket::SSL::SSL_Context(
              SSL_version => 'TLSv1',
              SSL_verify_mode => Net::SSLeay::VERIFY_NONE(),
              );
            IO::Socket::SSL::set_default_context($context);
    =cut
        @LWP::Protocol::http::EXTRA_SOCK_OPTS = (LocalAddr => $self->{init}->{client_ip},
                            SSL_version => $self->{ssl_ver},
                            SSL_cipher_list => $self->{ssl_cipher});
    
            bless $self, $class or die "Can't bless $class: $!";
            return $self;
    }
    
    

    现在这适用于初始化部分,但主要问题是在使用模块时出现的,如:

    my $form = HTML::Form->parse($res);
    if (condition){
          $post = $form->make_request;
    }
    $res = $self->{ua}->simple_request($post);
    $self->{ua}->no_proxy("10.x.x.x", "test.com", "10.x.x.x", "10.x.x.x", "10.x.x.x", "tests.com", "dummy.com");
    
    ...
    $req->authorization_basic($login,$password);
    $res = $self->{ua}->simple_request($req);
    
    
    ....
    
    $req = $self->GetCommonRequest( $url );
            $req->authorization_basic($login,$password);
            $req->header(Content_Type => 'application/x-www-form-urlencoded',
                Accept => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
                'Accept-Encoding' => 'gzip, deflate',
                Host => $host);
    ...
    
    

    在使用{ua}模块的用户的地方,例如简单请求、无代理和授权基本模块。我找不到对应于这些的python等价物。

    如果有人能告诉我这些模块的python等价物,我将非常感激。

    0 回复  |  直到 5 年前
        1
  •  0
  •   lenik    5 年前

    试着用这样的方法:

    from urllib2 import urlopen, URLError, HTTPError, Request
    from httplib import BadStatusLine, IncompleteRead
    
    # url -- the URL you're trying to access
    # data -- some params you want to POST
    try :
        headers = {
            'User-Agent': 'Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11',
            'accept'    : 'application/json',
        }
        headers['Authorization'] = 'Bearer %s' % self.auth[nickname]['access_token']
    
        if data is None :   # GET method
            req = Request( url, None, headers)
        else :  # POST method
            headers['Content-Type'] = 'application/json'
            data = json.dumps(data).encode('utf-8')
            req = Request( url, data, headers)
    
        result = urlopen( req ).read()
    
        print result
        return json.loads( result )
    
    except HTTPError, e:
        log( 'HTTP error: ' + str(e.code) )
        result = e.read()
        print result
        return json.loads( result )
    except URLError, e:
        log_this( 'unable to reach a server: ' + str(e.reason) )
    except BadStatusLine, e:
        log_this( 'Bad Status Line' )
    except IncompleteRead, e :
        log_this( 'IncompleteRead: ' + str(e) )
    except Exception, e :
        log_this( str(e) + ': ' + url )
        log_this( traceback.format_exc() )