User agent is a string that browsers use to identify itself to the web server. It is sent on every
HTTP request in the request header, and in the case of
Scrapy, it identifies as the following;
The web server could respond with the same content for all requests regardless of the provided user agent string, or if configured, could for example decide to return a mobile version of a website instead of a normal ones if the user agent indicates that the request is from a mobile browser.
In some cases however, the web server could outright deny a request altogether and this is especially true for requests from web scraping spiders such as
One way to avoid the issue is for
Scrapy to change the user agent string and identify itself as any browser and thankfully
Scrapy has the ability to do just that.
Steps to change user agent for Scrapy:
$ scrapy fetch https://www.example.com
Also work with
shell or any other options.
$ scrapy fetch https://www.example.com --set=USER_AGENT="custom user agent string"
Scrapyproject folder to change the user agent string for a
$ vi scrapyproject/settings.py
# Crawl responsibly by identifying yourself (and your website) on the user-agent #USER_AGENT = 'scraper (+http://www.yourdomain.com)'
#and set the value to the user-agent of your choice.
USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'
Comment anonymously. Login not required.