Request headers influence how servers interpret Scrapy traffic, affecting content negotiation, localization, and whether an endpoint responds with HTML or an API-style payload. Matching expected headers also helps avoid unexpected markup changes, redirects, or denied responses.
Scrapy creates a Request, merges project-wide DEFAULT_REQUEST_HEADERS with any per-request headers values, and then sends the request through downloader middlewares before it is dispatched. Per-request header keys override defaults for that specific request, while other default headers continue to apply.
Some headers are computed or managed by the HTTP client and middlewares and may be ignored or overwritten when set manually (for example Host and Content-Length). Keep header changes minimal and consistent with the endpoint, and treat cookies or Authorization tokens as secrets loaded at runtime rather than committed into versioned files.
$ vi simplifiedguide/settings.py
DEFAULT_REQUEST_HEADERS = { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language": "en", }
Header names are case-insensitive, and values passed as strings are converted into bytes internally.
import scrapy class ProductsSpider(scrapy.Spider): name = "products" start_urls = ["http://app.internal.example:8000/products/"] def start_requests(self): for url in self.start_urls: yield scrapy.Request( url=url, headers={"X-Requested-With": "XMLHttpRequest"}, callback=self.parse, ) def parse(self, response): yield { "url": response.url, "status": response.status, }
Per-request header keys override matching keys from DEFAULT_REQUEST_HEADERS while preserving other defaults.
$ scrapy shell -s HTTPCACHE_ENABLED=False http://app.internal.example:8000/headers
>> import json >>> headers = json.loads(response.text)["headers"] >>> headers["Accept"] 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' >>> headers["Accept-Language"] 'en'
>> fetch("http://app.internal.example:8000/headers", headers={"X-Requested-With": "XMLHttpRequest"})
>>> import json
>>> json.loads(response.text)["headers"]["X-Requested-With"]
'XMLHttpRequest'