代码之家 › 专栏 › 技术社区 › Joksova

使用Selenium和Python删除足球网站上的一些数据

web-scraping selenium-webdriver selenium python-3.x python

Joksova · 技术社区 · 2 年前

我正在尝试制作一个Python程序,使用Selenium提取一些数据,首先我必须关闭两个警报,然后单击“显示所有匹配项”按钮,最后我需要单击每个“统计”按钮(有多个,它们都有相同的类名),从这个表中提取特定的行。

我需要为每个游戏提取蓝色突出显示的4个值

我已经完成了前两步,但现在我陷入了最后一步,我必须点击每个“统计”按钮,从每个表中提取4个值,然后关闭窗口,进入下一场比赛。

这是我的密码

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time

s=Service("C:/Users/dhias/OneDrive/Bureau/stgg/chromedriver.exe")
driver=webdriver.Chrome(service=s)
driver.get("https://www.soccerstats.com/matches.asp?matchday=1#")
driver.maximize_window()
time.sleep(1)
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[mode='primary']"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.ID,"steady-floating-button"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[text()='Show all matches']"))).click()

我试着点击每一个“stats”按钮,它们都有相同的类名,但都不起作用

for element in driver.find_elements(By.XPATH,"//a[@class='myButton' and text()='stats']"):
    WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.XPATH,"//a[@class='myButton' and text()='stats']"))).click()

网站链接: soccerstats website

1 回复 | 直到 2 年前

flaxon 2 年前

将链接保存到array并在单击后单击,因为单击后,您不再位于具有链接的页面上

stat_links = []
#get all urls
for element in driver.find_elements(By.XPATH, "//a[@class='myButton' and text()='stats']"):
    stat_links.append(element.get_attribute('href'))
    
for link in stat_links:
    driver.get(link)
    # do your stuff

chitown88 2 年前

你确定你需要用硒吗?你可以很容易地拉那些有熊猫和请求的桌子。

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = 'https://www.soccerstats.com/matches.asp?matchday=1#'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
links = soup.find_all('a', text='stats')

filtered_links = []
for link in links:
    if 'pmatch' in link['href']:
        filtered_links.append(link['href'])

tables = {}
for count, link in enumerate(filtered_links, start=1):
    try:
        html = requests.get('https://www.soccerstats.com/' + link).text
        soup = BeautifulSoup(html, 'html.parser')
        
        goalsTable = soup.find('h2', text='Goal statistics')
        
        teams = goalsTable.find_next('table')
        teamsStr = teams.find_all('td')[0].text + ' ' + teams.find_all('td')[-1].text
        
        goalsTable = teams.find_next('table')
        df = pd.read_html(str(goalsTable))[0]
        
        print(f'{count} of {len(filtered_links)}: {teamsStr}')
        tables[teamsStr] = df
        
    except:
        print(f'{count} of {len(filtered_links)}: {teamsStr} !! NO GOALS STATISTICS !!')

输出: