代码之家 › 专栏 › 技术社区 › wen tian

使用beautifulsoup从网站中提取数字?

extract beautifulsoup python

wen tian · 技术社区 · 7 年前

以下python代码:

from bs4 import BeautifulSoup
div = '<div class="hm"><span class="xg1">æ¥ç:</span> 15660<span class="pipe">|</span><span class="xg1">åå¤:</span> 435</div>'
soup = BeautifulSoup(div, "lxml")
hm = soup.find("div", {"class": "hm"})
print(hm)

在这种情况下,我需要两个数字的输出:

15660
435

我想尝试使用beautifulsoup从网站中提取数字。但我不知道怎么做?

1 回复 | 直到 7 年前

cs95 abhishek58g 7 年前

呼叫 soup.find_all ,带有正则表达式-

>>> list(map(str.strip, soup.find_all(text=re.compile(r'\b\d+\b'))))

或

>>> [x.strip() for x in soup.find_all(text=re.compile(r'\b\d+\b'))]

['15660', '435']

如果需要整数而不是字符串,请调用 int 列表内理解-

>>> [int(x.strip()) for x in soup.find_all(text=re.compile(r'\b\d+\b'))]
[15660, 435]

推荐文章

July · 如何定义数字间隔,然后四舍五入

1 年前

Community wiki · 对象名称前的单下划线和双下划线的含义是什么?

1 年前

Brian Johnson · 为什么在Python中列出字典列表会引发TypeError?[已关闭]

1 年前

user026 · 如何根据特定窗口的平均值(行数)创建新列?

1 年前

Ashok Shrestha · 需要追踪特定的颜色线并获取坐标

1 年前

Nicote Ool · 在FastApi和Vue3中获得422

1 年前

NeoExceptCodeBad · 如果我有很多垂直线,我如何找到它们的边缘?

1 年前

Abdulaziz · 如何对集合内的列表进行排序[重复]

1 年前

user2743931 · 带有src目录的Python setup.py

1 年前

asmgx · 为什么合并数据帧不能按照python中的预期方式工作

1 年前