代码之家  ›  专栏  ›  技术社区  ›  Joksova

使用Selenium和Python删除足球网站上的一些数据

  •  0
  • Joksova  · 技术社区  · 2 年前

    我正在尝试制作一个Python程序,使用Selenium提取一些数据,首先我必须关闭两个警报,然后单击“显示所有匹配项”按钮,最后我需要单击每个“统计”按钮(有多个,它们都有相同的类名),从这个表中提取特定的行。

    the stats buttons

    我需要为每个游戏提取蓝色突出显示的4个值

    the table that I need to extract data from

    我已经完成了前两步,但现在我陷入了最后一步,我必须点击每个“统计”按钮,从每个表中提取4个值,然后关闭窗口,进入下一场比赛。

    这是我的密码

    from selenium import webdriver
    from selenium.webdriver.chrome.service import Service
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    import time
    
    s=Service("C:/Users/dhias/OneDrive/Bureau/stgg/chromedriver.exe")
    driver=webdriver.Chrome(service=s)
    driver.get("https://www.soccerstats.com/matches.asp?matchday=1#")
    driver.maximize_window()
    time.sleep(1)
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[mode='primary']"))).click()
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.ID,"steady-floating-button"))).click()
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[text()='Show all matches']"))).click()
    
    

    我试着点击每一个“stats”按钮,它们都有相同的类名,但都不起作用

    for element in driver.find_elements(By.XPATH,"//a[@class='myButton' and text()='stats']"):
        WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.XPATH,"//a[@class='myButton' and text()='stats']"))).click()
    

    网站链接: soccerstats website

    1 回复  |  直到 2 年前
        1
  •  1
  •   flaxon    2 年前

    将链接保存到array并在单击后单击,因为单击后,您不再位于具有链接的页面上

    stat_links = []
    #get all urls
    for element in driver.find_elements(By.XPATH, "//a[@class='myButton' and text()='stats']"):
        stat_links.append(element.get_attribute('href'))
        
    for link in stat_links:
        driver.get(link)
        # do your stuff
    
        2
  •  0
  •   chitown88    2 年前

    你确定你需要用硒吗?你可以很容易地拉那些有熊猫和请求的桌子。

    import requests
    import pandas as pd
    from bs4 import BeautifulSoup
    
    url = 'https://www.soccerstats.com/matches.asp?matchday=1#'
    response = requests.get(url)
    
    soup = BeautifulSoup(response.text, 'html.parser')
    links = soup.find_all('a', text='stats')
    
    filtered_links = []
    for link in links:
        if 'pmatch' in link['href']:
            filtered_links.append(link['href'])
    
    tables = {}
    for count, link in enumerate(filtered_links, start=1):
        try:
            html = requests.get('https://www.soccerstats.com/' + link).text
            soup = BeautifulSoup(html, 'html.parser')
            
            goalsTable = soup.find('h2', text='Goal statistics')
            
            teams = goalsTable.find_next('table')
            teamsStr = teams.find_all('td')[0].text + ' ' + teams.find_all('td')[-1].text
            
            goalsTable = teams.find_next('table')
            df = pd.read_html(str(goalsTable))[0]
            
            print(f'{count} of {len(filtered_links)}: {teamsStr}')
            tables[teamsStr] = df
            
        except:
            print(f'{count} of {len(filtered_links)}: {teamsStr} !! NO GOALS STATISTICS !!')
    

    输出:

    enter image description here