Concurrent requests control how many downloads Scrapy runs in parallel, which directly affects crawl speed, resource usage, and how much load gets placed on target sites. Concurrency tuning is a direct way to trade throughput for politeness and stability during crawls.

Scrapy uses an asynchronous downloader (via Twisted) and keeps multiple requests in flight at the same time. The CONCURRENT_REQUESTS setting caps total in-progress requests across the crawler, while CONCURRENT_REQUESTS_PER_DOMAIN limits parallelism per hostname to prevent a single site from consuming all downloader slots. The CONCURRENT_REQUESTS_PER_IP setting can cap parallelism by IP address instead, and overrides the per-domain limit when non-zero.

High concurrency can trigger throttling, temporary blocks, and more retries or timeouts when the remote side cannot keep up. Concurrency settings are typically tuned alongside DOWNLOAD_DELAY or AutoThrottle to keep the crawl responsive and reduce 429 responses.

Steps to set concurrent requests in Scrapy:

  1. Open the Scrapy project settings file.
    $ vi simplifiedguide/settings.py
  2. Set global and per-domain concurrency limits in settings.py.
    CONCURRENT_REQUESTS = 8
    CONCURRENT_REQUESTS_PER_DOMAIN = 4

    Excessive concurrency can cause throttling, increased retry rates, or IP blocks from target sites.

  3. Print the effective concurrency values from the project.
    $ scrapy settings --get CONCURRENT_REQUESTS
    8
    $ scrapy settings --get CONCURRENT_REQUESTS_PER_DOMAIN
    4
  4. Run a spider with the updated settings.
    $ scrapy crawl products
    2026-01-01 08:22:14 [scrapy.crawler] INFO: Overridden settings:
    {'BOT_NAME': 'simplifiedguide',
     'CONCURRENT_REQUESTS': 8,
     'CONCURRENT_REQUESTS_PER_DOMAIN': 4,
     'NEWSPIDER_MODULE': 'simplifiedguide.spiders',
     'SPIDER_MODULES': ['simplifiedguide.spiders']}
    ##### snipped #####