Setting a custom Referer header in Scrapy makes a request look like it came from a specific page. Some sites check that field before returning normal content, API data, or a secondary request flow, so sending the expected referrer can keep the crawl on the path the endpoint already accepts.
In current Scrapy releases, the direct way to do that is to put Referer in the request headers mapping on the exact Request object being yielded. The same pattern works on both scrapy.Request() and response.follow(), and a manual value stays in place because DefaultHeadersMiddleware and RefererMiddleware only fill missing headers.
A fixed referrer can leak full paths, search terms, or signed URLs if it is copied from live traffic without review. When a request should inherit the parent page instead, leave the header unset and let RefererMiddleware apply REFERRER_POLICY, because the default policy can still send a non-empty cross-origin referrer on HTTP and HTTPS requests.
$ vi referer_demo.py
ref = "https://app.example/cat" yield scrapy.Request( url, headers={ "Referer": ref, }, callback=self.parse, )
Use the exact header name Referer with that historical spelling; on older spiders that still override start_requests(), add the same headers={…} mapping to the yielded Request objects there.
ref = "https://app.example/list" yield response.follow( next_url, headers={ "Referer": ref, }, callback=self.parse_detail, )
$ scrapy shell --nolog
>>> import json, scrapy
>>> echo = "https://httpbin.org/get"
>>> ref = "https://app.example/cat"
>>> req = scrapy.Request(
... echo,
... headers={"Referer": ref},
... )
>>> fetch(req)
>>> data = json.loads(response.text)
>>> headers = data["headers"]
>>> headers["Referer"]
'https://app.example/cat'
>>> ref = "https://app.example/list"
>>> req = response.follow(
... echo,
... headers={"Referer": ref},
... )
>>> fetch(req)
>>> data = json.loads(response.text)
>>> headers = data["headers"]
>>> headers["Referer"]
'https://app.example/list'
If response.follow() receives no manual Referer header, RefererMiddleware uses the parent response URL instead.
Do not hard-code authenticated URLs, private paths, or signed query strings into a fixed Referer header, because upstream servers and local logs may record them.