HTTP responses such as 404, 410, and 500 are normal crawl results when pages disappear or an upstream application fails. This guide shows how to let a Scrapy spider receive a specific non-2xx response and branch on response.status inside the callback instead of losing that response before parsing starts.
Current Scrapy releases filter non-2xx responses through HttpErrorMiddleware before the spider callback runs. To handle one of those responses in normal spider code, allow only the status codes you need with the spider attribute handle_httpstatus_list, the per-request metadata key of the same name, or the project setting HTTPERROR_ALLOWED_CODES.
Keep the allowlist narrow and check the status code immediately so error pages do not flow into normal item parsing by accident. Use handle_httpstatus_all or HTTPERROR_ALLOW_ALL only for short diagnostics or dedicated error-capture spiders, and use an errback for connection failures or timeouts because those failures do not produce an HTTP response object.
Related: How to set request headers in Scrapy
Related: How to configure retries in Scrapy
$ vi http_errors_demo.py
Use scrapy runspider with a single file during quick tests, or place the same spider code inside a full project and run it with scrapy crawl <spider_name>.
import scrapy class HttpErrorsDemoSpider(scrapy.Spider): name = "http_errors_demo" start_urls = [ "http://127.0.0.1:18090/", "http://127.0.0.1:18090/missing.html", ] handle_httpstatus_list = [404] def parse(self, response): if response.status == 404: self.logger.info("Handled HTTP %s for %s", response.status, response.url) return yield { "status": response.status, "title": response.css("h1::text").get(), "url": response.url, }
handle_httpstatus_list on the spider passes only those status codes into the callback. All other non-2xx responses still follow the normal HttpErrorMiddleware path.
yield scrapy.Request( "http://127.0.0.1:18090/missing.html", callback=self.parse_missing, meta={"handle_httpstatus_list": [404]}, )
This keeps the rest of the spider on the default behavior. Related: How to use request meta in Scrapy
$ scrapy runspider http_errors_demo.py -s LOG_LEVEL=INFO -s HTTPCACHE_ENABLED=False 2026-04-22 10:28:30 [scrapy.core.engine] INFO: Spider opened 2026-04-22 10:28:31 [http_errors_demo] INFO: Handled HTTP 404 for http://127.0.0.1:18090/missing.html ##### snipped ##### 2026-04-22 10:28:31 [scrapy.core.engine] INFO: Spider closed (finished)
Disable or clear HTTPCACHE while testing status handling so cached responses do not hide changes in target behavior or middleware rules.
2026-04-22 10:28:31 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_count': 2,
'downloader/response_status_count/200': 1,
'downloader/response_status_count/404': 1,
'item_scraped_count': 1,
##### snipped #####
}
The 404 counter proves the response reached the spider without being discarded, while item_scraped_count: 1 shows the normal page still produced an item.