代码之家 › 专栏 › 技术社区 › Kevin Rodgers Jr.

Python BeautifulSoup:在in select语句中排除其他标记

css-selectors beautifulsoup python-3.x

Kevin Rodgers Jr. · 技术社区 · 2 年前

我在使用BeautifulSoup选择文本时遇到问题。我正在尝试从 <span class= "data"> 仅此而已,但我也在其他元素上不断获得结果。例如,在下面的代码中,我想要的单词是“Playstation 3”和“Game Boy Advance”,而不是“PC”,你能帮忙吗?

汤:

<span class="data">
                  PlayStation 3
                 </span>,
 <span class="data">
                  Game Boy Advance
                 </span>,
 <span class="data">
                  Dec 8, 2022
                 </span>,
 <span class="data">
 <a href="/game/pc">
                   PC
                  </a>

P、我在下面试过这个代码:

console = soup.select('span.data')
for console in console:
    print(console.get_text(strip = True))

输出代码段:

PlayStation 3
Game Boy Advance
Dec 8, 2022
PC

谢谢

1 回复 | 直到 2 年前

Andrej Kesely 2 年前

此示例将选择全部 <span class="data"> 其中没有任何其他标记:

from bs4 import BeautifulSoup

html_doc = """\
<span class="data">
                  PlayStation 3
                 </span>,
 <span class="data">
                  Game Boy Advance
                 </span>,
 <span class="data">
                  Dec 8, 2022
                 </span>,
 <span class="data">
 <a href="/game/pc">
                   PC
                  </a>
"""

soup = BeautifulSoup(html_doc, "html.parser")

for span in soup.select("span.data:not(:has(*))"):
    print(span.get_text(strip=True))

打印:

PlayStation 3
Game Boy Advance
Dec 8, 2022

推荐文章

Aaron Green · 我的python程序无法识别数据库的存在,即使它在那里

1 年前

danial · 如何在多个字符串的每个位置找到最频繁的字符

2 年前

Henry · 使用Python将json重新格式化为键值对

2 年前

eymentakak · json字典类型错误:字符串索引必须是整数

2 年前

Qubix · 从熊猫数据帧创建相对熵矩阵

2 年前

FÄÅ ÛÅ · 字典、列表和字符串

2 年前

OrbitDuster · 如何使用gmail api在python中打印gmail正文?

2 年前

guiguilecodeur · 如何删除我的词汇表中的重复元素

2 年前

Susheel P M · 这是关于if-else语句[关闭]

2 年前

Slartibartfast · 关于Python版本安装

2 年前