代码之家  ›  专栏  ›  技术社区  ›  Lisadk

Python xpath:尝试xpath,但填写给定值除外

  •  2
  • Lisadk  · 技术社区  · 7 年前

    我正在从网站上抓取评论。最后,我需要几个列表(例如用户名和日期),每个评论都会将这些列表放在dict中,这样看起来就像这样:

    reviews:[{'username':'Harry','date':'april'},
             {'username':'Rob','date':'may'}]
    

    评论=[]

    for i in range(len(username)):
        reviews.append({'username':username[i].strip(),
                                  'date':date[i].strip()})
    

    try:
        names = tree.xpath..
    except:
        "no name"
    

    编辑:示例 HTML 对于审查类型(移动与非移动)。 手机评论:

    <div class="rating reviewItemInline">
      <span class="ui_bubble_rating bubble_50"></span>
      <span class="ratingDate relativeDate">Reviewed 6 days ago</span>
      <a class="viaMobile">via mobile</a>
    </div>
    

    非移动评论:

    <div class="rating reviewItemInline">
      <span class="ui_bubble_rating bubble_50"></span>
      <span class="ratingDate relativeDate">Reviewed 6 days ago</span>
    </div>
    
    2 回复  |  直到 7 年前
        1
  •  1
  •   Andersson    7 年前

    try / except ,只需尝试获取两个包含所有必需元素的列表,如下所示:

    html = lxml.html.fromstring("source code here")
    reviews = html.xpath('//div[@class="rating reviewItemInline"]')
    dates = [i.xpath('./span[@class="ratingDate relativeDate"]')[0].text for i in reviews]
    mobile = [i.xpath('./a')[0].text if i.xpath('./a') else "no" for i in reviews]
    output = [{'date': i, 'via mobile': j} for i, j in zip(dates, mobile)]
    

    output 应该是这样的

    [{'date': 'Reviewed 6 days ago', 'via mobile': 'via mobile'}, {'date': 'Reviewed 6 days ago', 'via mobile': 'no'}]
    
        2
  •  0
  •   eLRuLL    7 年前

    review_elems = tree_html.xpath('//div[@class="rating reviewItemInline"]')
    
    reviews = []   
    
    for review_elem in reviews_elems:
        review = {}
        username = review_elem.xpath('.//a[@class="viaMobile"]')
        if username:
            review['username'] = username[0].text
        else:
            review['username'] = 'no name'
    
        # keep filling review with more fields
        reviews.append(review)
    
    print(reviews)