Downloader middleware is the right Scrapy layer for request and response logic that should run before every spider callback. It is useful for shared headers, response accounting, proxy selection, retry tagging, or short-circuiting downloads without repeating the same code in each spider.
Scrapy enables custom downloader middleware through DOWNLOADER_MIDDLEWARES, runs process_request() in increasing middleware order on the way to the downloader, and runs process_response() in decreasing order on the way back. Add process_exception() only when download failures need a retry, fallback response, or replacement request from the same middleware layer.
Current Scrapy releases deprecate requiring the spider argument in custom downloader middleware methods, even though a fresh scrapy startproject scaffold still writes placeholder downloader methods that include it. Replace those stubs with current method signatures, and keep from_crawler() when the middleware needs shared state such as crawler.stats or access to crawler.spider.
Related: How to create a Scrapy project
Related: How to create spider middleware in Scrapy
Steps to create downloader middleware in Scrapy:
- Open a terminal in the Scrapy project directory.
$ cd /srv/shopbot
Run project-aware commands from the directory that contains scrapy.cfg so class imports, settings, and spider names resolve correctly.
- Replace the generated downloader placeholder with a custom class that tags each request and records the returned status code.
$ vi shopbot/middlewares.py
- shopbot/middlewares.py
class AuditMiddleware: def __init__(self, stats): self.stats = stats @classmethod def from_crawler(cls, crawler): return cls(crawler.stats) def process_request(self, request): request.headers.setdefault( "X-Crawl-Source", "demo", ) self.stats.inc_value("audit/request") return None def process_response(self, request, response): self.stats.inc_value( f"audit/status/{response.status}" ) return response
This example uses crawler.stats so the crawl can prove the middleware ran without depending on the deprecated spider method argument.
If only one request or one spider needs a custom header, set it on that request instead of making the change project-wide. Related: How to set request headers in Scrapy
Add process_exception() only when failed downloads need a retry, fallback response, or replacement request from the middleware itself.
- Enable the custom middleware in settings.py.
$ vi shopbot/settings.py
- shopbot/settings.py
DOWNLOADER_MIDDLEWARES = { "shopbot.middlewares.AuditMiddleware": 543, }
Lower order values run earlier on the request path and later on the response path, so choose the number relative to the built-in downloader middleware that this class must run before or after.
If the project already defines DOWNLOADER_MIDDLEWARES, add the new class to that dictionary instead of replacing the existing middleware entries.
- Add or update a small spider that makes one request and yields one item.
$ vi shopbot/spiders/headers.py
- shopbot/spiders/headers.py
import scrapy class HeadersSpider(scrapy.Spider): name = "headers" start_urls = ["http://app.example/"] def parse(self, response): yield { "title": response.css("title::text").get(), "status": response.status, "url": response.url, }
Any reliable HTML page works here, including a local fixture, an internal test route, or a target page that returns a normal document with a <title> element.
- Run the spider and confirm that Scrapy lists the custom middleware and records the custom stats.
$ scrapy crawl headers -O items.jl -s ROBOTSTXT_OBEY=False 2026-04-22 10:42:46 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.offsite.OffsiteMiddleware', ##### snipped ##### 'shopbot.middlewares.AuditMiddleware', ##### snipped ##### ] 2026-04-22 10:42:46 [scrapy.extensions.feedexport] INFO: Stored jl feed (1 items) in: items.jl 2026-04-22 10:42:46 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'audit/request': 1, 'audit/status/200': 1, ##### snipped ##### }The runtime ROBOTSTXT_OBEY=False override keeps the example focused on the target page request. In a fresh project, leaving ROBOTSTXT_OBEY at its generated True value may add an initial robots.txt request and extra status counts before the main page request.
If the custom class does not appear under Enabled downloader middlewares or the audit/* keys never appear in the final stats, the class path is wrong or the setting is still commented out.
- Open the exported file to confirm that the spider still receives the response and yields items after the middleware runs.
$ cat items.jl {"title": "Middleware demo", "status": 200, "url": "http://app.example/"}If the custom stats appear but the export stays empty, inspect the spider callback before changing the middleware. The usual cause is a selector miss, a missing yield, or a response that another middleware replaced before the callback ran.
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
