Request headers influence how servers interpret Scrapy traffic, affecting content negotiation, localization, and whether an endpoint responds with HTML or an API-style payload. Matching expected headers also helps avoid unexpected markup changes, redirects, or denied responses.
Scrapy creates a Request, merges project-wide DEFAULT_REQUEST_HEADERS with any per-request headers values, and then sends the request through downloader middlewares before it is dispatched. Per-request header keys override defaults for that specific request, while other default headers continue to apply.
Some headers are computed or managed by the HTTP client and middlewares and may be ignored or overwritten when set manually (for example Host and Content-Length). Keep header changes minimal and consistent with the endpoint, and treat cookies or Authorization tokens as secrets loaded at runtime rather than committed into versioned files.
Steps to set request headers in Scrapy:
- Open the Scrapy project settings file (<project>/<project>/settings.py).
$ vi simplifiedguide/settings.py
- Define default headers for all requests.
DEFAULT_REQUEST_HEADERS = { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language": "en", }
Header names are case-insensitive, and values passed as strings are converted into bytes internally.
- Add a per-request header override in the spider.
import scrapy class ProductsSpider(scrapy.Spider): name = "products" start_urls = ["http://app.internal.example:8000/products/"] def start_requests(self): for url in self.start_urls: yield scrapy.Request( url=url, headers={"X-Requested-With": "XMLHttpRequest"}, callback=self.parse, ) def parse(self, response): yield { "url": response.url, "status": response.status, }
Per-request header keys override matching keys from DEFAULT_REQUEST_HEADERS while preserving other defaults.
- Start the Scrapy shell against a header echo endpoint.
$ scrapy shell -s HTTPCACHE_ENABLED=False http://app.internal.example:8000/headers
- Inspect the echoed headers for the default Accept and Accept-Language values.
>> import json >>> headers = json.loads(response.text)["headers"] >>> headers["Accept"] 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' >>> headers["Accept-Language"] 'en'
- Fetch the same endpoint with a per-request X-Requested-With header override.
>> fetch("http://app.internal.example:8000/headers", headers={"X-Requested-With": "XMLHttpRequest"}) >>> import json >>> json.loads(response.text)["headers"]["X-Requested-With"] 'XMLHttpRequest'
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
