你在蜘蛛网里做的两个请求都收到了
404 Not found
响应。默认情况下,Scrapy会忽略具有这种状态的响应,并且不会调用回调。
self.parse
回调调用这样的响应,您必须添加
404
使用
handle_httpstatus_list
here
).
你可以改变主意
start_requests
import scrapy
class TutorialSpider(scrapy.Spider):
name = "tutorial"
def start_requests(self):
urls = [
'https://example.com/page/1',
'https://example.com/page/2',
]
for url in urls:
print(f'{self.name} spider')
print(f'url is {url}')
yield scrapy.Request(
url=url,
callback=self.parse,
meta={'handle_httpstatus_list': [404]},
)
def parse(self, response):
print(response.url)
self.log(response.url)
sys.stdout.write('hello')