代码之家 › 专栏 › 技术社区 › user1922364

从一个页面获取所有链接

html-parsing beautifulsoup web-scraping python

user1922364 · 技术社区 · 7 年前

我使用beautifulsoup从一个页面获取所有链接。我的代码是:

import requests
from bs4 import BeautifulSoup


url = 'http://www.acontecaeventos.com.br/marketing-promocional-sao-paulo'
r = requests.get(url)
html_content = r.text
soup = BeautifulSoup(html_content, 'lxml')

soup.find_all('href')

[]

如何获取该页面上所有href链接的列表?

3 回复 | 直到 7 年前

Anonta 7 年前

find_all 要查找的方法 href 标签, 属性。

<a>

links = soup.find_all('a')

稍后您可以访问他们的 href

link = links[0]          # get the first link in the entire page
url  = link['href']      # get value of the href attribute
url  = link.get('href')  # or like this

Nrzonline Tamer Tas 7 年前

替换最后一行:

links = soup.find_all('a')

按照这句话:

links = [a.get('href') for a in soup.find_all('a', href=True)]

它将废弃所有 a 标签,以及每个 href

如果您想了解更多关于 [] List comprehensions

Oliver Oliver 5 年前

href 无论标签的使用:

href_tags = soup.find_all(href=True)   
hrefs = [tag.get('href') for tag in href_tags]

推荐文章

July · 如何定义数字间隔,然后四舍五入

1 年前

Community wiki · 对象名称前的单下划线和双下划线的含义是什么?

1 年前

Brian Johnson · 为什么在Python中列出字典列表会引发TypeError?[已关闭]

1 年前

user026 · 如何根据特定窗口的平均值(行数)创建新列?

1 年前

Ashok Shrestha · 需要追踪特定的颜色线并获取坐标

1 年前

Nicote Ool · 在FastApi和Vue3中获得422

1 年前

NeoExceptCodeBad · 如果我有很多垂直线,我如何找到它们的边缘?

1 年前

Abdulaziz · 如何对集合内的列表进行排序[重复]

1 年前

user2743931 · 带有src目录的Python setup.py

1 年前

asmgx · 为什么合并数据帧不能按照python中的预期方式工作

1 年前