代码之家  ›  专栏  ›  技术社区  ›  Filipe Ferminiano

用靓汤获取数据产品元素

  •  -1
  • Filipe Ferminiano  · 技术社区  · 6 年前

    我想从一个有漂亮汤的网站上获取数据。我有这部分代码,我想在数据产品元素中获得JSON部分。 我该怎么做?

    此代码:

    soup_catalog.find('a',class_="product-li")
    

    返回以下内容:

    <a class="product-li" data-product='{"product":"0431772", "basketId":"043177500", "type":"product", "category":"ga", "subCategory":"gpes", "webVideoUrl": "None", "brand":"konami", "title_url": "pes-2018-para-ps3-konami", "title": "PES 2018 para PS3", "reference": "Konami", "stockTypes": {"043177500": "F"}, "price": "89.9"}' href="https://www.magazineluiza.com.br/pes-2018-para-ps3-konami/p/0431772/ga/gpes/" itemprop="url" title="PES 2018 para PS3">\n<span class="js-wishlist-action wishlist__simple-text">\n<i class="wishlist__favorite-icon js-add-wishlist"></i>\n</span>\n<div class="alignment-image">\n<img alt="PES 2018 para PS3 - Konami" class="product-image" data-original="https://c.mlcdn.com.br//pes-2018-para-ps3-konami/v/210x210/043177500.jpg" height="210" src="https://d25zlb44gqlazw.cloudfront.net/static/img/default/white1x1-e0a7e4ed.gif" title="PES 2018 para PS3 - Konami" width="210"/>\n</div>\n<noscript>\n<img alt="PES 2018 para PS3 - Konami" height="210" itemprop="image" src="https://c.mlcdn.com.br//pes-2018-para-ps3-konami/v/210x210/043177500.jpg" title="PES 2018 para PS3 - Konami" width="210"/>\n</noscript>\n<span class="product-content-other-informations">\n<span class="rating-container">\n<span class="rateing sprite-stars star-medium" itemprop="aggregateRating" itemscope="" itemtype="http://schema.org/AggregateRating">\n<em class="sprite-stars" style="width:90.0%"></em>\n<meta content="4.5" itemprop="ratingValue">\n<meta content="78" itemprop="reviewCount">\n</meta></meta></span>\n</span>\n</span>\n<h3 class="productTitle" itemprop="name">PES 2018 para PS3 - Konami</h3>\n<meta content="0431772" itemprop="productID">\n<meta content="None" itemprop="description">\n<p itemscope="" itemtype="http://schema.org/Brand"><meta content="konami" itemprop="name"/></p>\n<span class="productPrice" itemprop="offers" itemscope="" itemtype="http://schema.org/Offer">\n<span class="priceContent color-green none-product-showcase">desconto de R$ 79,10</span>\n<meta content="BRL" itemprop="priceCurrency">\n<meta content="89,90" itemprop="price">\n<span class="originalPrice">de R$ 169,00</span>\n<span class="price">\n                        por R$ 89,90\n                    </span>\n<meta content="InStock" itemprop="availability"/>\n</meta></meta></span>\n</meta></meta></a>
    

    然后我试着:

    soup_catalog.find('a',class_="product-li").find('data-product')
    

    但数据产品未被退回。 我该怎么做?

    2 回复  |  直到 6 年前
        1
  •  1
  •   Rakesh    6 年前

    这应该会有帮助

    from bs4 import BeautifulSoup
    
    s = """<a class="product-li" data-product='{"product":"0431772", "basketId":"043177500", "type":"product", "category":"ga", "subCategory":"gpes", "webVideoUrl": "None", "brand":"konami", "title_url": "pes-2018-para-ps3-konami", "title": "PES 2018 para PS3", "reference": "Konami", "stockTypes": {"043177500": "F"}, "price": "89.9"}' href="https://www.magazineluiza.com.br/pes-2018-para-ps3-konami/p/0431772/ga/gpes/" itemprop="url" title="PES 2018 para PS3">\n<span class="js-wishlist-action wishlist__simple-text">\n<i class="wishlist__favorite-icon js-add-wishlist"></i>\n</span>\n<div class="alignment-image">\n<img alt="PES 2018 para PS3 - Konami" class="product-image" data-original="https://c.mlcdn.com.br//pes-2018-para-ps3-konami/v/210x210/043177500.jpg" height="210" src="https://d25zlb44gqlazw.cloudfront.net/static/img/default/white1x1-e0a7e4ed.gif" title="PES 2018 para PS3 - Konami" width="210"/>\n</div>\n<noscript>\n<img alt="PES 2018 para PS3 - Konami" height="210" itemprop="image" src="https://c.mlcdn.com.br//pes-2018-para-ps3-konami/v/210x210/043177500.jpg" title="PES 2018 para PS3 - Konami" width="210"/>\n</noscript>\n<span class="product-content-other-informations">\n<span class="rating-container">\n<span class="rateing sprite-stars star-medium" itemprop="aggregateRating" itemscope="" itemtype="http://schema.org/AggregateRating">\n<em class="sprite-stars" style="width:90.0%"></em>\n<meta content="4.5" itemprop="ratingValue">\n<meta content="78" itemprop="reviewCount">\n</meta></meta></span>\n</span>\n</span>\n<h3 class="productTitle" itemprop="name">PES 2018 para PS3 - Konami</h3>\n<meta content="0431772" itemprop="productID">\n<meta content="None" itemprop="description">\n<p itemscope="" itemtype="http://schema.org/Brand"><meta content="konami" itemprop="name"/></p>\n<span class="productPrice" itemprop="offers" itemscope="" itemtype="http://schema.org/Offer">\n<span class="priceContent color-green none-product-showcase">desconto de R$ 79,10</span>\n<meta content="BRL" itemprop="priceCurrency">\n<meta content="89,90" itemprop="price">\n<span class="originalPrice">de R$ 169,00</span>\n<span class="price">\n                        por R$ 89,90\n                    </span>\n<meta content="InStock" itemprop="availability"/>\n</meta></meta></span>\n</meta></meta></a>"""
    soup = BeautifulSoup(s, "html.parser")
    i = soup.find("a",class_="product-li")
    print(i["data-product"])
    

    输出:

    {"product":"0431772", "basketId":"043177500", "type":"product", "category":"ga", "subCategory":"gpes", "webVideoUrl": "None", "brand":"konami", "title_url": "pes-2018-para-ps3-konami", "title": "PES 2018 para PS3", "reference": "Konami", "stockTypes": {"043177500": "F"}, "price": "89.9"}
    
        2
  •  1
  •   Igor S    6 年前

    您可以从标记属性中获取数据,如下所示:

    soup_catalog.find('a',class_='product-li').get('data-provider')