Setting a download timeout keeps a Scrapy crawl from stalling on slow or unresponsive endpoints, freeing downloader slots for healthy targets and keeping runtimes predictable.
Scrapy enforces time limits in the downloader through DownloadTimeoutMiddleware using the DOWNLOAD_TIMEOUT setting, causing a request to fail when the limit is exceeded so retries and error handling can proceed.
The default DOWNLOAD_TIMEOUT is 180 seconds, so lowering it can increase timeout failures and retries on slow sites while raising it can tie up concurrency on bad connections; mixed-latency targets can use download_timeout overrides per spider or per request instead of a single global value.
Related: How to configure retries in Scrapy
Related: How to set a download delay in Scrapy
Steps to set a download timeout in Scrapy:
- Open the Scrapy project settings.py file.
$ vi simplifiedguide/settings.py
- Set DOWNLOAD_TIMEOUT to the preferred value in seconds.
DOWNLOAD_TIMEOUT = 20
Default is 180 seconds, so values set too low can trigger false timeouts and excessive retries on slow targets.
- Set the download_timeout spider attribute to override the timeout for a single spider.
import scrapy class ExampleSpider(scrapy.Spider): name = "example" download_timeout = 30
Use a spider-level override when one spider consistently targets slower endpoints than the rest of the project.
- Set download_timeout in Request.meta to override the timeout for a single request.
import scrapy class ExampleSpider(scrapy.Spider): name = "example" def start_requests(self): url = "https://example.net/" yield scrapy.Request(url, meta={"download_timeout": 10})
- Confirm the global timeout value loaded by Scrapy settings.
$ scrapy settings --get DOWNLOAD_TIMEOUT 20
Per-spider and per-request download_timeout overrides apply at runtime and do not change the global settings value shown here.
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
