代码之家 › 专栏 › 技术社区 › Ricky

如何单击pdf内容的第一个链接

web-scraping selenium python-3.x python

Ricky · 技术社区 · 6 年前

我对selenium和python还不熟悉,我想用pdf获取第一个链接的url

driver = webdriver.Chrome(executable_path='/Users/mac/Downloads/chromedriver')
driver.get("https://google.com/search?query=" + searchList[i])
driver.find_element_by_css_selector("span.sFZIhb.b.w.xsm").click()
url = driver.current_url
print(url)

2 回复 | 直到 6 年前

Moshe Slavin 6 年前

基于@InfernO的XPath,这里有一个snip,它获取所有URL并单击第一个:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
# options.add_argument("--headless")
options.add_argument("--incognito")
searchList = ["pdf example", "pdf file"]
urls = []
for i, word in enumerate(searchList):
    driver = webdriver.Chrome("C:\workspace\TalSolutionQA\general_func_class\chromedriver.exe", chrome_options=options)
    driver.get("https://google.com/search?query=" + searchList[i])
    all_urls = driver.find_elements_by_xpath("//a[contains(@href, '.pdf')]")
    urls.append([i.get_attribute("href") for i in all_urls])
    print(f'the urls:{[i.get_attribute("href") for i in all_urls]}')
    all_urls[0].click()
    driver.quit()

print(urls)

欢迎来到硒很多乐趣等着你!

Infern0 6 年前

获取第一个包含url with.pdf的链接并单击它。

driver.find_element_by_xpath("//a[contains(@href, '.pdf')])[1]").click();

推荐文章

July · 如何定义数字间隔,然后四舍五入

1 年前

Community wiki · 对象名称前的单下划线和双下划线的含义是什么?

1 年前

Brian Johnson · 为什么在Python中列出字典列表会引发TypeError?[已关闭]

1 年前

user026 · 如何根据特定窗口的平均值(行数)创建新列?

1 年前

Ashok Shrestha · 需要追踪特定的颜色线并获取坐标

1 年前

Nicote Ool · 在FastApi和Vue3中获得422

1 年前

NeoExceptCodeBad · 如果我有很多垂直线,我如何找到它们的边缘?

1 年前

Abdulaziz · 如何对集合内的列表进行排序[重复]

1 年前

user2743931 · 带有src目录的Python setup.py

1 年前

asmgx · 为什么合并数据帧不能按照python中的预期方式工作

1 年前