HTTP responses such as 404, 410, and 500 are normal crawl results when pages disappear or an upstream application fails. This guide shows how to let a Scrapy spider receive a specific non-2xx response and branch on response.status inside the callback instead of losing that response before parsing starts.
Current Scrapy releases filter non-2xx responses through HttpErrorMiddleware before the spider callback runs. To handle one of those responses in normal spider code, allow only the status codes you need with the spider attribute handle_httpstatus_list, the per-request metadata key of the same name, or the project setting HTTPERROR_ALLOWED_CODES.
Keep the allowlist narrow and check the status code immediately so error pages do not flow into normal item parsing by accident. Use handle_httpstatus_all or HTTPERROR_ALLOW_ALL only for short diagnostics or dedicated error-capture spiders, and use an errback for connection failures or timeouts because those failures do not produce an HTTP response object.
Related: How to set request headers in Scrapy
Related: How to configure retries in Scrapy
Steps to handle HTTP error responses in Scrapy:
- Open the spider file that should receive the HTTP error response.
$ vi http_errors_demo.py
Use scrapy runspider with a single file during quick tests, or place the same spider code inside a full project and run it with scrapy crawl <spider_name>.
- Allow only the status codes this spider should receive in its callback.
import scrapy class HttpErrorsDemoSpider(scrapy.Spider): name = "http_errors_demo" start_urls = [ "http://127.0.0.1:18090/", "http://127.0.0.1:18090/missing.html", ] handle_httpstatus_list = [404] def parse(self, response): if response.status == 404: self.logger.info("Handled HTTP %s for %s", response.status, response.url) return yield { "status": response.status, "title": response.css("h1::text").get(), "url": response.url, }
handle_httpstatus_list on the spider passes only those status codes into the callback. All other non-2xx responses still follow the normal HttpErrorMiddleware path.
- Add the allowlist to one Request instead when only one URL should reach the callback with a non-2xx response.
yield scrapy.Request( "http://127.0.0.1:18090/missing.html", callback=self.parse_missing, meta={"handle_httpstatus_list": [404]}, )
This keeps the rest of the spider on the default behavior. Related: How to use request meta in Scrapy
- Run the spider and confirm the callback logs the handled 404 response.
$ scrapy runspider http_errors_demo.py -s LOG_LEVEL=INFO -s HTTPCACHE_ENABLED=False 2026-04-22 10:28:30 [scrapy.core.engine] INFO: Spider opened 2026-04-22 10:28:31 [http_errors_demo] INFO: Handled HTTP 404 for http://127.0.0.1:18090/missing.html ##### snipped ##### 2026-04-22 10:28:31 [scrapy.core.engine] INFO: Spider closed (finished)
Disable or clear HTTPCACHE while testing status handling so cached responses do not hide changes in target behavior or middleware rules.
- Check the final crawl stats and confirm Scrapy counted the handled error response separately from the scraped item.
2026-04-22 10:28:31 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_count': 2, 'downloader/response_status_count/200': 1, 'downloader/response_status_count/404': 1, 'item_scraped_count': 1, ##### snipped ##### }The 404 counter proves the response reached the spider without being discarded, while item_scraped_count: 1 shows the normal page still produced an item.
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
