代码之家  ›  专栏  ›  技术社区  ›  user260223

转换爬网程序的url

  •  2
  • user260223  · 技术社区  · 14 年前

    如何在Python中执行此操作?

    URL 1:www.odevsitesi.com/ara.asp?kelime=doan±博祖尔马斯±

    URL 2:www.odevsitesi.com/ara.asp?kelime=do%F0an%FDn%20dengesinin%20bozulmas%FD

    2 回复  |  直到 14 年前
        1
  •  5
  •   Alex Martelli    14 年前

    您需要正确地对URL进行编码(在您的例子中是iso-8859-9),将其分成几个部分,urllib.quote引用查询部分,然后再次将其放在一起。即。:

    >>> import urlparse
    >>> import urllib
    >>> x = u'http://www.odevsitesi.com/ara.asp?kelime=doğanın dengesinin bozulması' 
    >>> y = x.encode('iso-8859-9')
    >>> # just to show what the split of y looks like (we can also handle it as a tuple):
    >>> urlparse.urlsplit(y)
    SplitResult(scheme='http', netloc='www.odevsitesi.com', path='/ara.asp', query='kelime=do\xf0an\xfdn dengesinin bozulmas\xfd', fragment='')
    >>> z = urlparse.urlsplit(y)
    >>> quoted = z[:3] + (urllib.quote(z.query), z.fragment)
    >>> # now just to show you what the 'quoted' tuple looks like:
    >>> quoted
    ('http', 'www.odevsitesi.com', '/ara.asp', 'kelime%3Ddo%F0an%FDn%20dengesinin%20bozulmas%FD', '')
    >>> # and finally putting it back together:
    >>> urlparse.urlunsplit(quoted)
    'http://www.odevsitesi.com/ara.asp?kelime%3Ddo%F0an%FDn%20dengesinin%20bozulmas%FD'
    
        2
  •  4
  •   FogleBird    14 年前

    urllib.quote文件

    http://docs.python.org/library/urllib.html#urllib.quote

    quote('/~connolly/') 产量 '/%7econnolly/' .