Changing the User-Agent header allows a Scrapy spider to look like a regular browser instead of a crawler signature, which helps avoid simplistic blocks and can trigger the same layout or device-specific responses seen by real clients.
A web server receives the user-agent string on every HTTP request and may vary content or apply filters based on it. Scrapy sends a default value (Scrapy/<version> (+https://scrapy.org)) via its UserAgentMiddleware unless a request provides its own User-Agent header.
User-agent spoofing does not bypass more advanced anti-bot controls, and a mismatched header set (for example, a mobile user agent with desktop-only headers) can still be flagged. Keep crawler behavior within site policy, and prefer testing the string with a single request before making it the project default.
Related: How to set request headers in Scrapy
Related: How to use an HTTP proxy in Scrapy
$ scrapy fetch --nolog http://app.internal.example:8000/headers
{
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en",
"User-Agent": "Scrapy/2.11.1 (+https://scrapy.org)",
"Accept-Encoding": "gzip, deflate, br",
"Host": "app.internal.example:8000"
}
}
Related: List of browser user agents
$ scrapy fetch --nolog http://app.internal.example:8000/headers --set=USER_AGENT="Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148"
{
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en",
"User-Agent": "Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148",
"Accept-Encoding": "gzip, deflate, br",
"Host": "app.internal.example:8000"
}
}
The override applies only to this command invocation.
$ vi simplifiedguide/settings.py
# Crawl responsibly by identifying yourself (and your website) on the user-agent #USER_AGENT = "simplifiedguide (+http://app.internal.example)"
USER_AGENT = 'Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148'
Invalid Python syntax in settings.py prevents spiders and scheduled jobs from starting.
$ scrapy fetch --nolog http://app.internal.example:8000/headers
{
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en",
"User-Agent": "Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148",
"Accept-Encoding": "gzip, deflate, br",
"Host": "app.internal.example:8000"
}
}