代码之家  ›  专栏  ›  技术社区  ›  syntacticsugar247

使用beautiful soup从billboard top 100网站检索艺术家名称时遇到问题

  •  0
  • syntacticsugar247  · 技术社区  · 2 年前

    我正在尝试使用python包BeautifulSoup从这个url检索最流行的歌曲。当我用艺术家的名字去抓住跨度时,它抓住了正确的跨度,但当我叫“我”时。span上的“文本”它不会抓住span标记之间的文本。

    这是我的代码:

    import requests
    from bs4 import BeautifulSoup
    
    r = requests.get('https://www.billboard.com/charts/hot-100/')
    soup = BeautifulSoup(r.content, 'html.parser')
    result = soup.find_all('div', class_='o-chart-results-list-row-container')
    for res in result:
        songName = res.find('h3').text.strip()
        artist = res.find('span',class_='c-label a-no-trucate a-font-primary-s lrv-u-font-size-14@mobile-max u-line-height-normal@mobile-max u-letter-spacing-0021 lrv-u-display-block a-truncate-ellipsis-2line u-max-width-330 u-max-width-230@tablet-only').text
        print("song: "+songName)
        print("artist: "+ str(artist))
        print("___________________________________________________")
    

    目前每首歌都会打印以下内容:

    song: Waiting On A Miracle
    artist: <span class="c-label a-no-trucate a-font-primary-s lrv-u-font-size-14@mobile-max u-line-height-normal@mobile-max u-letter-spacing-0021 lrv-u-display-block a-truncate-ellipsis-2line u-max-width-330 u-max-width-230@tablet-only">
    
            Stephanie Beatriz
    </span>
    ___________________________________________________
    

    我怎么才能只提取艺术家的名字?

    1 回复  |  直到 2 年前
        1
  •  0
  •   chitown88    2 年前

    如果班上只有一个角色,它就抓不住了。我只是简单地说,一旦得到了歌曲的标题,艺术家就会在下一首歌中跟进 <span> 标签明白了吗 <h3> 像你为歌曲做的那样标记,然后使用 .find_next() 要获得艺术家:

    import requests
    from bs4 import BeautifulSoup
    
    r = requests.get('https://www.billboard.com/charts/hot-100/')
    soup = BeautifulSoup(r.content, 'html.parser')
    result = soup.find_all('div', class_='o-chart-results-list-row-container')
    for res in result:
        songName = res.find('h3').text.strip()
        artist = res.find('h3').find_next('span').text.strip()
        print("song: "+songName)
        print("artist: "+ str(artist))
        print("___________________________________________________")
    

    输出:

    song: Heat Waves
    artist: Glass Animals
    ___________________________________________________
    song: Stay
    artist: The Kid LAROI & Justin Bieber
    ___________________________________________________
    song: Super Gremlin
    artist: Kodak Black
    ___________________________________________________
    song: abcdefu
    artist: GAYLE
    ___________________________________________________
    song: Ghost
    artist: Justin Bieber
    ___________________________________________________
    song: We Don't Talk About Bruno
    artist: Carolina Gaitan, Mauro Castillo, Adassa, Rhenzy Feliz, Diane Guerrero, Stephanie Beatriz & Encanto Cast
    ___________________________________________________
    song: Enemy
    artist: Imagine Dragons X JID
    ___________________________________________________
    
    ....