代码之家 › 专栏 › 技术社区 › Nathan123

怎么弄汤。找你们都在美发集团工作?

beautifulsoup web-scraping python-3.x

-1

Nathan123 · 技术社区 · 6 年前

我正试图用BeaurifulSoup从一页纸上搜集信息,上面有律师的名字

#importing libraries
from urllib.request import urlopen 
from bs4 import BeautifulSoup
import requests

下面是嵌套在HTML标记中的每个律师的姓名示例

 </a>
          <div class="person-info search-person-info people-search-person-info">
           <div class="col person-name-position">
            <a href="https://www.foxrothschild.com/richard-s-caputo/">
             Richard S. Caputo
            </a>

我试着用下面的脚本提取每个律师的名字 'a' 作为标签和“ col person-name-position “作为班级。但似乎没用。相反,它会打印出一个空列表。

page=requests.get("https://www.foxrothschild.com/people/?search%5Bname%5D=&search%5Bkeyword%5D=&search%5Boffice%5D=&search%5Bpeople-position%5D=&search%5Bpeople-bar-admission%5D=&search%5Bpeople-language%5D=&search%5Bpeople-school%5D=Villanova+University+School+of+Law&search%5Bpractice-area%5D=") #insert page here
soup=BeautifulSoup(page.content,'html.parser')
#print(soup.prettify())
find_name=soup.find_all('a',class_='col person-name-position')
print(find_name)

2 回复 | 直到 6 年前

Learning is a mess 6 年前

class="col person-name-position" 是 div 对象,因此需要使用:

find_name=soup.find_all('div',class_='col person-name-position')
for entry in find_name:
    a_element = entry.find("a")
    #...

FanMan 6 年前

你得把汤换一下。找你所有的 div 因为上课的时候 分区 而不是 a

page=requests.get("https://www.foxrothschild.com/people/search%5Bname%5D=&search%5Bkeywod%5D=&search%5Boffice%5D=&search%5Bpeople-position%5D=&search%5Bpeople-bar-admission%5D=&search%5Bpeople-language%5D=&search%5Bpeople-school%5D=Villanova+University+School+of+Law&search%5Bpractice-area%5D=") 
#insert page here
soup=BeautifulSoup(page.content,'html.parser')
#print(soup.prettify())
find_name=soup.find_all('div',class_='col person-name-position')
print(find_name)

推荐文章

Lukinator · 为什么这个使用Selenium的网络爬虫不返回整个网站?

6 月前

user28864790 · 无法使用Python中的Selenium Webdriver在Chrome中登录网站

6 月前

babylinguist · 如何使用rechart模拟按钮点击

7 月前

Stackie · 无法使用Selenium访问废料数据的链接

7 月前

Avraham · 如何在JS中将beautifulsoup中的文本设置为.innerText而非.textContent

10 月前