Request headers control how a target site interprets a Scrapy request, which affects content negotiation, language selection, referer checks, and whether the response is regular HTML or an API payload. Setting the expected headers keeps spider behavior closer to the client profile that the endpoint is designed to accept.

Scrapy builds each Request, applies project-wide DEFAULT_REQUEST_HEADERS through DefaultHeadersMiddleware, and then lets request-level headers values override only the matching fields for that request. That makes it practical to keep shared headers in settings.py while changing only the values that need to differ for a specific spider or request.

Some headers have dedicated Scrapy behavior or are better handled outside raw header overrides. Cookies should be passed through Request.cookies instead of a raw Cookie header, and a request can remove one inherited default header by setting that header value to None.

Steps to set request headers in Scrapy:

  1. Define the shared headers that most requests should inherit in <project>/<project>/settings.py.
    DEFAULT_REQUEST_HEADERS = {
        "Accept": "application/json",
        "Accept-Language": "en",
    }

    These values are added by DefaultHeadersMiddleware and stay in place until a request sets the same header name itself.

  2. Fetch a header echo endpoint from the project directory to confirm the default headers are leaving the crawler.
    $ scrapy fetch --nolog http://app.internal.example:8000/headers
    {
      "headers": {
        "Accept": "application/json",
        "Accept-Language": "en",
        "User-Agent": "Scrapy/2.15.0 (+https://scrapy.org)",
        "Accept-Encoding": "gzip, deflate, br",
        "Host": "app.internal.example:8000"
      }
    }

    Any header echo endpoint that returns the received request headers works here, including an internal test route or a temporary local endpoint.

  3. Create the request with a headers dictionary when one spider needs an extra header or a different value for the same field.
    import scrapy
     
    class ProductsSpider(scrapy.Spider):
        name = "products"
        start_urls = ["http://app.internal.example:8000/products"]
     
        def start_requests(self):
            for url in self.start_urls:
                yield scrapy.Request(
                    url,
                    headers={
                        "Accept": "application/vnd.api+json",
                        "X-Requested-With": "XMLHttpRequest",
                    },
                    callback=self.parse,
                )
     
        def parse(self, response):
            yield {
                "url": response.url,
                "status": response.status,
            }

    Request-level headers replace only the matching keys, so the Accept override above still keeps the project default Accept-Language.

  4. Open the Scrapy shell and fetch a new Request to verify the per-request header override.
    $ scrapy shell --nolog http://app.internal.example:8000/headers
    >>> import json, scrapy
    >>> fetch(scrapy.Request("http://app.internal.example:8000/headers", headers={"Accept": "application/vnd.api+json", "X-Requested-With": "XMLHttpRequest"}))
    >>> headers = json.loads(response.text)["headers"]
    >>> headers["Accept"]
    'application/vnd.api+json'
    >>> headers["X-Requested-With"]
    'XMLHttpRequest'
    >>> headers["Accept-Language"]
    'en'

    The shell shortcut accepts a URL or a Request object, so passing scrapy.Request(…) preserves the request-specific header override for that fetch.

  5. Remove one inherited default on a specific request by setting that header value to None.
    >> fetch(scrapy.Request("http://app.internal.example:8000/headers", headers={"Accept-Language": None}))
    >>> "Accept-Language" in json.loads(response.text)["headers"]
    False

    Use Request.cookies for cookie state instead of writing a raw Cookie header, because CookiesMiddleware does not read cookies from the header field.