代码之家  ›  专栏  ›  技术社区  ›  David Copperfield

这个selenium代码可以用scrapy重新创建吗?

  •  0
  • David Copperfield  · 技术社区  · 2 年前

    我很想更好地了解scratch的功能。这里有一个非常简单的硒代码,它可以与网站交互,填充一些框,单击一些元素并下载一个文件。这个代码可以用scrapy复制吗?,因此,代码是使用做完全相同事情的scrapy编写的。

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    options=Options()
    options.add_argument("--window-size=1920,1080")
    
    driver=webdriver.Chrome(options=options)
       
    driver.get("https://www.ons.gov.uk/")
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "q"))).send_keys("Education and childcare")
    click_button=driver.find_element_by_xpath('//*[@id="nav-search-submit"]').click()
    click_button=driver.find_element_by_xpath('//*[@id="results"]/div[1]/div[2]/div[1]/h3/a/span').click()
    click_button=driver.find_element_by_xpath('//*[@id="main"]/div[2]/div[1]/section/div/div[1]/div/div[2]/h3/a/span').click()
    click_button=driver.find_element_by_xpath('//*[@id="main"]/div[2]/div/div[1]/div[2]/p[2]/a').click()
    
    0 回复  |  直到 2 年前
        1
  •  0
  •   Md. Fazlul Hoque    2 年前

    "selenium code be recreated using scrapy" SeleniuRequest 哪个是 superfast 比一般硒含量高。你需要一个充满斗志的项目。它以无头模式工作,但总是为每一步获取屏幕截图。

    脚本:

    import scrapy
    from scrapy_selenium import SeleniumRequest
    from selenium import webdriver
    
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    
    
    
    class TestSpider(scrapy.Spider):
        name = 'test'
    
        def start_requests(self):
            yield SeleniumRequest(
                url='https://www.ons.gov.uk',
                callback=self.parse,
                wait_time = 3,
                screenshot = True
            )
    
        def parse(self, response):
            driver = response.meta['driver']
            driver.save_screenshot('screenshot.png')
    
            WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "q"))).send_keys("Education and childcare")
            driver.save_screenshot('screenshot_1.png')
            click_button=driver.find_element_by_xpath('//*[@id="nav-search-submit"]').click()
            driver.save_screenshot('screenshot_2.png')
            click_button=driver.find_element_by_xpath('//*[@id="results"]/div[1]/div[2]/div[1]/h3/a/span').click()
            click_button=driver.find_element_by_xpath('//*[@id="main"]/div[2]/div[1]/section/div/div[1]/div/div[2]/h3/a/span').click()
            click_button=driver.find_element_by_xpath('//*[@id="main"]/div[2]/div/div[1]/div[2]/p[2]/a').click()
        
    

    Screenshot

    settings.py文件:

    您必须在settings.py文件中添加以下选项

    # Middleware
    
    DOWNLOADER_MIDDLEWARES = {
        'scrapy_selenium.SeleniumMiddleware': 800
    }
    
    
    # Selenium
    from shutil import which
    SELENIUM_DRIVER_NAME = 'chrome'
    SELENIUM_DRIVER_EXECUTABLE_PATH = which('chromedriver')
    SELENIUM_DRIVER_ARGUMENTS = ['--headless']
    

    SeleniumRequest

    输出:

    'downloader/response_status_count/200'
    

    screenshot of the project looks like

    How to download pdf using scrapy

    screenshot