代码之家  ›  专栏  ›  技术社区  ›  chitown88

BeautifulSoup-仅返回第一个表

  •  1
  • chitown88  · 技术社区  · 7 年前

    我最近一直在和BeautifulSoup合作。我正试图从 https://www.pro-football-reference.com/teams/mia/2000_roster.htm 地点特别是我想要的是球员的名字和“gs”(游戏开始)。

    然而,在执行此操作时,它只返回第一个(“启动器”)表数据。实际上,我对排名第一的那张桌子一点也不感兴趣,我想要第二张桌子,标题是“花名册”。

    这是我正在做的代码。正如我所说,除了球员姓名和开始的比赛之外,我真的不想/需要任何东西,但我只是在练习和学习BeautifulSoup。

    import pandas as pd
    import requests
    import bs4
    
    alpha  = requests.get('https://www.pro-football-
    reference.com/teams/mia/2000_roster.htm')
    
    beta = bs4.BeautifulSoup(alpha.text,'lxml')
    
    
    gama = beta.findAll('th',{'data-stat':'pos'})
    position = [th.text for th in gama]
    position = position[1:]
    position = list(filter(None, position))
    
    gama = beta.findAll('td',{'data-stat':'player'})
    player = [td.text for td in gama]
    player = player[1:]
    while 'Defensive Starters' in player: player.remove('Defensive Starters')
    while 'Special Teams Starters' in player: player.remove('Special Teams 
    Starters')
    
    gama = beta.findAll('td',{'data-stat':'age'})
    age = [td.text for td in gama]
    age = list(filter(None, age))
    
    gama = beta.findAll('td',{'data-stat':'gs'})
    gs = [td.text for td in gama]
    gs = list(filter(None, gs))
    
    target = pd.DataFrame(
    
    {
    'player_name':player,
    'position':position,
    'gs':gs,
    'age':age
    })
    

    有人知道我哪里出错了吗?或者是另一种方式?

    1 回复  |  直到 7 年前
        1
  •  3
  •   SIM    7 年前

    要从该表中获取内容,您需要使用任何浏览器模拟器,因为该部分的响应是动态生成的。不过,不需要任何浏览器模拟器,就可以轻松访问第一个表中的数据。我在这种情况下尝试了硒:

    from bs4 import BeautifulSoup
    from selenium import webdriver
    
    driver = webdriver.Chrome()
    page_url = "https://www.pro-football-reference.com/teams/mia/2000_roster.htm"
    driver.get(page_url)
    soup = BeautifulSoup(driver.page_source, "lxml")
    table = soup.select(".table_outer_container")[1]
    for items in table.select("tr"):
        player = items.select("[data-stat='player']")[0].text
        gs = items.select("[data-stat='gs']")[0].text
        print(player,gs)
    
    driver.quit()
    

    Player  GS
    Trace Armstrong* 0
    John Bock 1
    Tim Bowens 15
    Lorenzo Bromell 0
    Autry Denson 0
    Mark Dixon 15
    Kevin Donnalley 16
    

    出于某种原因,如果您遇到此类错误,那么这次也不会有此类错误选项:

    from bs4 import BeautifulSoup
    from selenium import webdriver
    
    driver = webdriver.Chrome()
    page_url = "https://www.pro-football-reference.com/teams/mia/2000_roster.htm"
    driver.get(page_url)
    soup = BeautifulSoup(driver.page_source, "lxml")
    table = soup.select(".table_outer_container")[1]
    for items in table.select("tr"):
        player = items.select("[data-stat='player']")[0].text if items.select("[data-stat='player']") else ""
        gs = items.select("[data-stat='gs']")[0].text if items.select("[data-stat='gs']") else ""
        print(player,gs)
    
    driver.quit()