代码之家 › 专栏 › 技术社区 › Ken J

Python-Scrapy提取aria标签的值

scrapy python

Ken J · 技术社区 · 6 年前

我是Scrapy的新手,我正在尝试在类上刮出一个带有aria标签的页面:

<body>
  <div class="item-price" aria-label="$1.99">
    .....
  </div>
</body>

我正试图在我的spider上使用以下解析来提取标签:

def parse(self, response):
   price = circular_item.css("div.item-price > aria-label::text").extract()
   yield price

2018-09-02 18:34:03 [scrapy.core.scraper] ERROR: Spider must return Request, BaseItem, dict or None, got 'list' in <GET https://example.com/test.html>

2 回复 | 直到 6 年前

gangabass 6 年前

代码中有几个错误:

def parse(self, response):
   item = {}
   item["price"] = response.xpath('//div[@class="item-price"]/@aria-label').extract_first()
   yield item

Thomas Strub 6 年前

如果要使用css提取器而不是xpath:

def parse(self, response):
    item = {response.css('div.item-price::attr(aria-label)').extract_first()}
    yield item

推荐文章

July · 如何定义数字间隔,然后四舍五入

1 年前

Community wiki · 对象名称前的单下划线和双下划线的含义是什么?

1 年前

Brian Johnson · 为什么在Python中列出字典列表会引发TypeError?[已关闭]

1 年前

user026 · 如何根据特定窗口的平均值(行数)创建新列?

1 年前

Ashok Shrestha · 需要追踪特定的颜色线并获取坐标

1 年前

Nicote Ool · 在FastApi和Vue3中获得422

1 年前

NeoExceptCodeBad · 如果我有很多垂直线,我如何找到它们的边缘?

1 年前

Abdulaziz · 如何对集合内的列表进行排序[重复]

1 年前

user2743931 · 带有src目录的Python setup.py

1 年前

asmgx · 为什么合并数据帧不能按照python中的预期方式工作

1 年前