A crawl depth limit keeps a Scrapy spider from drifting too far from its starting URLs. That matters on catalogs, archives, and paginated listings where every extra hop can multiply the crawl size and quietly broaden the dataset.
Scrapy tracks request depth through DepthMiddleware. Start URLs begin at depth 0, each followed Request increases depth by 1, and requests deeper than DEPTH_LIMIT are dropped before they are scheduled. Enabling DEPTH_STATS_VERBOSE adds per-depth counters to the final crawl stats so the cutoff is easy to verify.
The default DEPTH_LIMIT is 0, which means no cap. A low limit can block product pages, article details, or later pagination pages from ever being reached, while a high limit can still expand into search loops or deep archive trees, so the limit works best alongside tight link extraction rules and domain boundaries.
Related: How to use CrawlSpider in Scrapy
Related: How to scrape paginated pages with Scrapy
Steps to set a crawl depth limit in Scrapy:
- Open the Scrapy project settings file.
$ vi catalogdemo/settings.py
In a default project layout, settings.py lives inside the project package directory that also contains the spiders module.
- Set DEPTH_LIMIT to the deepest hop the spider may follow and enable verbose depth stats for the run summary.
DEPTH_LIMIT = 2 DEPTH_STATS_VERBOSE = True
Start URLs are depth 0, links extracted from those pages are depth 1, and DEPTH_LIMIT = 2 still allows one more followed page after that.
The default DEPTH_LIMIT is 0, so leaving the setting unset does not protect the crawl from deep navigation trees.
- Run the spider with debug logging so DepthMiddleware prints the requests it is dropping.
$ scrapy crawl catalog -s LOG_LEVEL=DEBUG 2026-04-16 08:47:27 [scrapy.spidermiddlewares.depth] DEBUG: Ignoring link (depth > 2): https://catalog.example/products/widget/ ##### snipped ##### 2026-04-16 08:47:27 [scrapy.core.engine] INFO: Spider closed (finished)
Use a per-run override such as -s DEPTH_LIMIT=1 when testing a narrower cutoff without editing settings.py.
- Review the final crawl stats to confirm the spider stopped at the configured depth.
2026-04-16 08:47:27 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'request_depth_count/0': 1, 'request_depth_count/1': 1, 'request_depth_count/2': 1, 'request_depth_max': 2, ##### snipped ##### }If request_depth_max is higher than expected, confirm scrapy.spidermiddlewares.depth.DepthMiddleware is still enabled in SPIDER_MIDDLEWARES.
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
