代码之家  ›  专栏  ›  技术社区  ›  undetected Selenium

如何创建通过HtmlUnitDriver和htmlUnitHeadless browser以不同方式呈现的类别(US)下的项目列表?

  •  0
  • undetected Selenium  · 技术社区  · 6 年前

    如何创建“从中选择类别(US)”下的项目列表 amzscout 通过HtmlUnitDriver和HtmlUnit无头浏览器呈现不同的效果?

    使用 / 火狐 / 组合,我可以创建列表并打印如下:

    • 代码试用:

      System.setProperty("webdriver.gecko.driver", "C:/Utility/BrowserDrivers/geckodriver.exe");
      WebDriver driver = new FirefoxDriver();
      driver.get("https://amzscout.net/sales-estimator");
      List<WebElement> elements = new WebDriverWait(driver, 10).until(ExpectedConditions.visibilityOfAllElementsLocatedBy(By.cssSelector("span.cat-pick_name-in")));
      for (WebElement ele:elements)
          System.out.println(ele.getAttribute("innerHTML"));
      driver.quit();
      
    • 控制台输出:

      Appliances
      Arts, Crafts &amp; Sewing
      Automotive
      .
      .
      .
      

    但是,使用 HTML的呈现方式似乎有所不同,如下所示:

    完整的html在 pastebin

    HTML的相关部分是:

    <script type="application/ld+json">
      //<![CDATA[
    
      {
        "@context": "http://schema.org/",
        "@type": "Product",
        "name": "AMZScout Sales Estimator",
        "image": "",
        "brand": "AMZScout",
        "aggregateRating": {
          "@type": "AggregateRating",
          "ratingValue": "4.7",
          "bestRating": "5",
          "worstRating": "1",
          "ratingCount": "231"
        }
      }
    
      //]]>
    </script>
    <script type="text/javascript" src="/js/common.js">
    </script>
    <script type="text/javascript">
      //<![CDATA[
    
      const DATA = {
          COM: [
            ["Appliances", "s-cat-icon-appliances"],
            ["Arts, Crafts & Sewing", "s-cat-icon-craft"],
            ["Automotive", "s-cat-icon-automotive"],
            ["Baby", "s-cat-icon-baby"],
            ["Beauty & Personal Care", "s-cat-icon-beauty"],
            ["Books", "s-cat-icon-books"],
            ["Camera & Photo", "s-cat-icon-camera"],
            ["Cell Phones & Accessories", "s-cat-icon-phone"],
            ["Clothing, Shoes & Jewelry", "s-cat-icon-clothing"],
            ["Computers & Accessories", "s-cat-icon-computers"],
            ["Electronics", "s-cat-icon-electronics"],
            ["Grocery & Gourmet Food", "s-cat-icon-food"],
            ["Health & Household", "s-cat-icon-health"],
            ["Home and Garden", "s-cat-icon-home"],
            ["Home & Kitchen", "s-cat-icon-kitchen"],
            ["Industrial & Scientific", "s-cat-icon-gear"],
            ["Jewelry", "s-cat-icon-jewelry"],
            ["Kindle Store", "s-cat-icon-kindle"],
            ["Kitchen & Dining", "s-cat-icon-dining"],
            ["Musical Instruments", "s-cat-icon-musical-instruments"],
            ["Office Products", "s-cat-icon-office"],
            ["Patio, Lawn & Garden", "s-cat-icon-lawn"],
            ["Pet Supplies", "s-cat-icon-pet-food"],
            ["Shoes", "s-cat-icon-shoes"],
            ["Software", "s-cat-icon-software"],
            ["Sports & Outdoors", "s-cat-icon-sports"],
            ["Tools & Home Improvement", "s-cat-icon-repairs"],
            ["Toys & Games", "s-cat-icon-toys"],
            ["Watches", "s-cat-icon-watches"],
            ["Video Games", "s-cat-icon-joystick"]
          ],
          CO_UK: [
              ["Baby", "s-cat-icon-baby"],

    其中引用了:

    $(function () { var rankInput = $('.cat-rank_input'); function toggleRank(e) { var cats = $('.cat-pick'); var rank = $('.cat-rank'); var list = rank.find('.cat-pick_list'); var $el = $(e.currentTarget).clone(); $el.on('click', toggleRank).css('cursor',
    'pointer'); list.empty(); list.append($el); category = $el.find('.cat-pick_name-in').text(); rankInput.val(''); cats.toggle(); rank.toggle(); if ($(window).width() >= 768) { var catsHeight = cats.height(); rank.height(catsHeight); } if (rank.is(':visible'))
    { val.text('?'); setTimeout(function () {rankInput.focus()}, 0); } } function selectDomain(d) { const data = DATA[d]; const list = $('.cat-pick .cat-pick_list'); list.empty(); data.filter(function (d) {return d[1] != ''}).forEach(function (d) { var el
    = $('
    <div class="cat-pick_i"><span class="cat-pick_link"><span class="cat-pick_ico"><span></span></span><span class="cat-pick_name"><span class="cat-pick_name-in"></span></span>
      </span>
    </div>'); el.find('.cat-pick_ico span').addClass(d[1]); el.find('.cat-pick_name-in').text(d[0]); el.on('click', toggleRank); list.append(el); }); domain = d; } rankInput.on('change', function () {rank = rankInput.val()}); rankInput.on('keyup', function(e) {e.keyCode
    == 13 && (rank = rankInput.val()) && getEstSales()}); $('.cat-rank_another-link').on('click', toggleRank); $('#domain').on('change', function (e) {selectDomain(e.target.value);}); selectDomain(domain); });

    有人能帮我吗?

    1 回复  |  直到 6 年前
        1
  •  2
  •   RBRi    6 年前

    正如您已经发现的,您正在查找的项是由javascript创建的。这意味着您必须启用对HtmlUnit的javascript支持。

    第二点是以某种方式等待javascript完成。您正在使用“visibilityOfAllElementsLocatedBy”和文档状态:

    期望检查一下 所有元素都存在 在网页上,匹配的定位器是可见的。

    如果没有元素(或者不是所有的元素,因为javascript仍在创建新的元素)匹配您的选择器,那么这是正确的。正因为如此,我改变了等待条件一点,真正等到元素被创建。

    我的最终源代码如下所示,并创建了您所期望的列表:

    String url = "https://amzscout.net/sales-estimator";                        
    
    // true enables javascript support                                          
    WebDriver driver = new HtmlUnitDriver(true);                                
    try {                                                                       
        driver.get(url);                                                        
    
        // wait until the elements are created                                  
        List<WebElement> elements =
                new WebDriverWait(driver, 10)               
                    .until(ExpectedConditions                                   
                        .numberOfElementsToBeMoreThan(                          
                                By.cssSelector("span.cat-pick_name-in"), 29));  
    
        System.out.println();                                                   
        for (WebElement ele : elements) {                                       
            System.out.println(ele.getAttribute("innerHTML"));
        }
    } finally {                                                                 
        driver.quit();                                                          
    }
    

    希望对你有帮助。。。。