代码之家  ›  专栏  ›  技术社区  ›  Aminah Nuraini

如何将503导致的刮削失败标记为刮削错误?

  •  1
  • Aminah Nuraini  · 技术社区  · 6 年前

    所以我爬的时候得到了状态503。它被重试,但随后会被忽略。我希望它被标记为错误,而不是被忽略。怎么做?

    我更喜欢把它放进去 settings.py 所以它适用于我所有的蜘蛛。 handle_httpstatus_list 似乎只会影响一只蜘蛛。

    2 回复  |  直到 6 年前
        2
  •  1
  •   Aminah Nuraini    6 年前

    settings.py

    from scrapy.downloadermiddlewares.retry import *
    
    class Retry500Middleware(RetryMiddleware):
    
        def _retry(self, request, reason, spider):
            retries = request.meta.get('retry_times', 0) + 1
    
            if retries <= self.max_retry_times:
                logger.debug("Retrying %(request)s (failed %(retries)d times): %(reason)s",
                             {'request': request, 'retries': retries, 'reason': reason},
                             extra={'spider': spider})
                retryreq = request.copy()
                retryreq.meta['retry_times'] = retries
                retryreq.dont_filter = True
                retryreq.priority = request.priority + self.priority_adjust
                return retryreq
            else:
                # This is the point where I update it. It used to be `logger.debug` instead of `logger.error`
                logger.error("Gave up retrying %(request)s (failed %(retries)d times): %(reason)s",
                             {'request': request, 'retries': retries, 'reason': reason},
                             extra={'spider': spider})