代码之家  ›  专栏  ›  技术社区  ›  Edgaras

Python-使用BeautifulSoup和Urllib进行抓取

  •  3
  • Edgaras  · 技术社区  · 6 年前

    我正在努力阅读网站,但不幸的是,有些地方出了问题。

    import bs4 as bs
    import urllib.request
    
    sauce = urllib.request.urlopen('https://csgoempire.com/withdraw').read()
    soup = bs.BeautifulSoup(sauce,'lxml')
    
    print(soup.find_all('p'))
    

    错误:

    Traceback (most recent call last):
      File "F:/Informatika/Python3X/GamblinSitesBot/GamblingSitesBot.py", line 4, in <module>
        sauce = urllib.request.urlopen('https://csgoempire.com/').read()
      File "c:\users\edgaras\appdata\local\programs\python\python36\Lib\urllib\request.py", line 223, in urlopen
        return opener.open(url, data, timeout)
      File "c:\users\edgaras\appdata\local\programs\python\python36\Lib\urllib\request.py", line 532, in open
        response = meth(req, response)
      File "c:\users\edgaras\appdata\local\programs\python\python36\Lib\urllib\request.py", line 642, in http_response
        'http', request, response, code, msg, hdrs)
      File "c:\users\edgaras\appdata\local\programs\python\python36\Lib\urllib\request.py", line 570, in error
        return self._call_chain(*args)
      File "c:\users\edgaras\appdata\local\programs\python\python36\Lib\urllib\request.py", line 504, in _call_chain
        result = func(*args)
      File "c:\users\edgaras\appdata\local\programs\python\python36\Lib\urllib\request.py", line 650, in http_error_default
        raise HTTPError(req.full_url, code, msg, hdrs, fp)
    urllib.error.HTTPError: HTTP Error 403: Forbidden
    
    Process finished with exit code 1
    

    此外,该代码可与其他网站(如谷歌)一起使用。com公司

    1 回复  |  直到 6 年前
        1
  •  5
  •   toheedNiaz    6 年前

    您可以使用请求库实现同样的功能。这个很好用

    import bs4 as bs
    import requests
    
    sauce = requests.get('https://csgoempire.com/withdraw')
    soup = bs.BeautifulSoup(sauce.content,'html.parser')
    print(soup.find_all('p'))