要从该表中获取内容,您需要使用任何浏览器模拟器,因为该部分的响应是动态生成的。不过,不需要任何浏览器模拟器,就可以轻松访问第一个表中的数据。我在这种情况下尝试了硒:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
page_url = "https://www.pro-football-reference.com/teams/mia/2000_roster.htm"
driver.get(page_url)
soup = BeautifulSoup(driver.page_source, "lxml")
table = soup.select(".table_outer_container")[1]
for items in table.select("tr"):
player = items.select("[data-stat='player']")[0].text
gs = items.select("[data-stat='gs']")[0].text
print(player,gs)
driver.quit()
Player GS
Trace Armstrong* 0
John Bock 1
Tim Bowens 15
Lorenzo Bromell 0
Autry Denson 0
Mark Dixon 15
Kevin Donnalley 16
出于某种原因,如果您遇到此类错误,那么这次也不会有此类错误选项:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
page_url = "https://www.pro-football-reference.com/teams/mia/2000_roster.htm"
driver.get(page_url)
soup = BeautifulSoup(driver.page_source, "lxml")
table = soup.select(".table_outer_container")[1]
for items in table.select("tr"):
player = items.select("[data-stat='player']")[0].text if items.select("[data-stat='player']") else ""
gs = items.select("[data-stat='gs']")[0].text if items.select("[data-stat='gs']") else ""
print(player,gs)
driver.quit()