代码之家 › 专栏 › 技术社区 › David Copperfield

这个selenium代码可以用scrapy重新创建吗?

scrapy python-3.x python

David Copperfield · 技术社区 · 2 年前

我很想更好地了解scratch的功能。这里有一个非常简单的硒代码,它可以与网站交互,填充一些框,单击一些元素并下载一个文件。这个代码可以用scrapy复制吗?,因此,代码是使用做完全相同事情的scrapy编写的。

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options=Options()
options.add_argument("--window-size=1920,1080")

driver=webdriver.Chrome(options=options)
   
driver.get("https://www.ons.gov.uk/")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "q"))).send_keys("Education and childcare")
click_button=driver.find_element_by_xpath('//*[@id="nav-search-submit"]').click()
click_button=driver.find_element_by_xpath('//*[@id="results"]/div[1]/div[2]/div[1]/h3/a/span').click()
click_button=driver.find_element_by_xpath('//*[@id="main"]/div[2]/div[1]/section/div/div[1]/div/div[2]/h3/a/span').click()
click_button=driver.find_element_by_xpath('//*[@id="main"]/div[2]/div/div[1]/div[2]/p[2]/a').click()

0 回复 | 直到 2 年前

Md. Fazlul Hoque 2 年前

"selenium code be recreated using scrapy" 与 SeleniuRequest 哪个是 superfast 比一般硒含量高。你需要一个充满斗志的项目。它以无头模式工作,但总是为每一步获取屏幕截图。

脚本:

import scrapy
from scrapy_selenium import SeleniumRequest
from selenium import webdriver

from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC




class TestSpider(scrapy.Spider):
    name = 'test'

    def start_requests(self):
        yield SeleniumRequest(
            url='https://www.ons.gov.uk',
            callback=self.parse,
            wait_time = 3,
            screenshot = True
        )

    def parse(self, response):
        driver = response.meta['driver']
        driver.save_screenshot('screenshot.png')

        WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "q"))).send_keys("Education and childcare")
        driver.save_screenshot('screenshot_1.png')
        click_button=driver.find_element_by_xpath('//*[@id="nav-search-submit"]').click()
        driver.save_screenshot('screenshot_2.png')
        click_button=driver.find_element_by_xpath('//*[@id="results"]/div[1]/div[2]/div[1]/h3/a/span').click()
        click_button=driver.find_element_by_xpath('//*[@id="main"]/div[2]/div[1]/section/div/div[1]/div/div[2]/h3/a/span').click()
        click_button=driver.find_element_by_xpath('//*[@id="main"]/div[2]/div/div[1]/div[2]/p[2]/a').click()

Screenshot

settings.py文件:

您必须在settings.py文件中添加以下选项

# Middleware

DOWNLOADER_MIDDLEWARES = {
    'scrapy_selenium.SeleniumMiddleware': 800
}


# Selenium
from shutil import which
SELENIUM_DRIVER_NAME = 'chrome'
SELENIUM_DRIVER_EXECUTABLE_PATH = which('chromedriver')
SELENIUM_DRIVER_ARGUMENTS = ['--headless']

SeleniumRequest

输出:

'downloader/response_status_count/200'

screenshot of the project looks like

How to download pdf using scrapy

screenshot