Changing the User-Agent header allows a Scrapy spider to look like a regular browser instead of a crawler signature, which helps avoid simplistic blocks and can trigger the same layout or device-specific responses seen by real clients.
A web server receives the user-agent string on every HTTP request and may vary content or apply filters based on it. Scrapy sends a default value (Scrapy/<version> (+https://scrapy.org)) via its UserAgentMiddleware unless a request provides its own User-Agent header.
User-agent spoofing does not bypass more advanced anti-bot controls, and a mismatched header set (for example, a mobile user agent with desktop-only headers) can still be flagged. Keep crawler behavior within site policy, and prefer testing the string with a single request before making it the project default.
Related: How to set request headers in Scrapy
Related: How to use an HTTP proxy in Scrapy
Steps to change the user agent for Scrapy spiders:
- Fetch an endpoint that echoes the received user agent to confirm Scrapy's default header.
$ scrapy fetch --nolog http://app.internal.example:8000/headers { "headers": { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language": "en", "User-Agent": "Scrapy/2.11.1 (+https://scrapy.org)", "Accept-Encoding": "gzip, deflate, br", "Host": "app.internal.example:8000" } } - Choose the browser user agent string to send in requests.
Related: List of browser user agents
- Override the USER_AGENT setting for a single command run using the --set option.
$ scrapy fetch --nolog http://app.internal.example:8000/headers --set=USER_AGENT="Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148" { "headers": { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language": "en", "User-Agent": "Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148", "Accept-Encoding": "gzip, deflate, br", "Host": "app.internal.example:8000" } }The override applies only to this command invocation.
- Open the Scrapy project settings file in a text editor.
$ vi simplifiedguide/settings.py
- Locate the USER_AGENT setting line in settings.py.
# Crawl responsibly by identifying yourself (and your website) on the user-agent #USER_AGENT = "simplifiedguide (+http://app.internal.example)"
- Set the USER_AGENT value in settings.py to permanently change the user agent for the project.
USER_AGENT = 'Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148'
Invalid Python syntax in settings.py prevents spiders and scheduled jobs from starting.
- Fetch the echo endpoint again to verify the project default user agent is in effect.
$ scrapy fetch --nolog http://app.internal.example:8000/headers { "headers": { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language": "en", "User-Agent": "Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148", "Accept-Encoding": "gzip, deflate, br", "Host": "app.internal.example:8000" } }
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
