A crawl depth limit keeps a Scrapy spider from drifting far away from the starting URLs. Capping link hops helps keep scope predictable, prevents accidental coverage explosions on large sites, and reduces unnecessary load on the target.
In Scrapy, request depth is tracked by DepthMiddleware and starts at 0 for the start URLs. Each followed link increments depth by 1, and new requests deeper than the configured DEPTH_LIMIT are filtered out before scheduling.
A small DEPTH_LIMIT can exclude pages buried behind categories, pagination layers, or multi-step navigation. A large limit can explode into near-infinite URL spaces (faceted search, calendars, session-driven URLs), so depth limiting works best when paired with tight link extraction rules and sensible allowed domains.
Related: How to use CrawlSpider in Scrapy
Related: How to scrape paginated pages with Scrapy
Steps to set a crawl depth limit in Scrapy:
- Open the Scrapy project settings file.
$ vi simplifiedguide/settings.py
settings.py is typically under the Scrapy project package directory (the same directory that contains the spiders module).
- Set DEPTH_LIMIT to the maximum number of link hops allowed from the start URLs.
DEPTH_LIMIT = 1
Start URLs are depth 0, their direct links are depth 1, and so on (for example, DEPTH_LIMIT = 1 follows only links found on the start pages).
- Enable verbose depth statistics for easier validation during test crawls.
DEPTH_STATS_VERBOSE = True
Verbose depth stats add per-depth counters such as request_depth_count/1 and request_depth_count/2 alongside request_depth_max.
- Run the spider with LOG_LEVEL set to DEBUG to display depth filtering messages.
$ scrapy crawl products -s LOG_LEVEL=DEBUG 2026-01-01 08:21:24 [scrapy.spidermiddlewares.depth] DEBUG: Ignoring link (depth > 1): http://app.internal.example:8000/products?page=3 ##### snipped #####
Override the limit for a single run without editing settings.py by adding -s DEPTH_LIMIT=1 to the command line.
- Confirm the crawl did not exceed the configured DEPTH_LIMIT in the spider statistics.
2026-01-01 08:21:24 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'request_depth_max': 1, 'request_depth_count/0': 1, 'request_depth_count/1': 1, ##### snipped ##### }If depth filtering messages never appear and request_depth_max exceeds expectations, confirm scrapy.spidermiddlewares.depth.DepthMiddleware has not been removed or disabled in SPIDER_MIDDLEWARES.
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
