代码之家 › 专栏 › 技术社区 › SIM

无法使用请求连接到Tor,而我使用selenium进行了相同的操作

tor python-requests web-scraping python-3.x python

SIM · 技术社区 · 6 年前

我用python编写了两个脚本:一个使用 selenium 另一个使用 requests http://check.torproject.org 使用托尔得到这段文字 祝贺你。此浏览器配置为使用Tor 为了确保我做的事情是正确的。

当我使用下面的脚本时,我可以流畅地获得文本:

from selenium import webdriver
import os

torexe = os.popen(r"C:\Users\WCS\Desktop\Tor Browser\Browser\TorBrowser\Tor\tor.exe")

options = webdriver.ChromeOptions()
options.add_argument('--proxy-server=socks5://localhost:9050')
driver = webdriver.Chrome(chrome_options=options)

driver.get("http://check.torproject.org")
item = driver.find_element_by_css_selector("h1.not").text
print(item)

driver.quit()

但是,当我尝试使用 请求 ,我得到一个错误 AttributeError: 'NoneType' object has no attribute 'text' :

import requests
from bs4 import BeautifulSoup
import os

torexe = os.popen(r"C:\Users\WCS\Desktop\Tor Browser\Browser\TorBrowser\Tor\tor.exe")

with requests.Session() as s:
    s.proxies['http'] = 'socks5://localhost:9050'
    res = s.get("http://check.torproject.org")
    soup = BeautifulSoup(res.text,"lxml")
    item = soup.select_one("h1.not").text
    print(item)

我怎样才能得到相同的文字使用 请求 从那个网站?

当我用这个的时候 print(soup.title.text) ,我可以得到这个文本 Sorry. You are not using Tor. 请求 不是通过 Tor .

1 回复 | 直到 6 年前

drew010 6 年前

check.torproject.org强制HTTPS,因此当请求遵循重定向到 https://check.torproject.org 您不再使用SOCKS代理,因为它只是为 http 协议。

socks5h .

s.proxies['http']  = 'socks5h://localhost:9050'
s.proxies['https'] = 'socks5h://localhost:9050'

这将导致您的测试正常工作。

推荐文章

Aaron Green · 我的python程序无法识别数据库的存在,即使它在那里

1 年前

danial · 如何在多个字符串的每个位置找到最频繁的字符

2 年前

Henry · 使用Python将json重新格式化为键值对

2 年前

eymentakak · json字典类型错误:字符串索引必须是整数

2 年前

Qubix · 从熊猫数据帧创建相对熵矩阵

2 年前

FÄÅ ÛÅ · 字典、列表和字符串

2 年前

OrbitDuster · 如何使用gmail api在python中打印gmail正文?

2 年前

guiguilecodeur · 如何删除我的词汇表中的重复元素

2 年前

Susheel P M · 这是关于if-else语句[关闭]

2 年前

Slartibartfast · 关于Python版本安装

2 年前