User agent is a string that browsers use to identify itself to the web server. It is sent on every
HTTP request in the request header, and in the case of
Scrapy, it identifies as the following;
The web server could then be configured to respond accordingly based on the user agent string. A request from a mobile device for example, could be served with mobile-specific content. Some web servers however are configured to block web scraping traffic altogether and is a problem when using
One way to avoid the issue is for
Scrapy to change the user agent string and identify itself as any other browser.
$ scrapy fetch https://www.example.com
Also work with
shell or any other method.
setoption to change the
USER_AGENTvalue for the
$ scrapy fetch https://www.example.com --set=USER_AGENT="custom user agent string"
Scrapy's configuration file using your favorite text.
$ vi scrapyproject/settings.py
# Crawl responsibly by identifying yourself (and your website) on the user-agent #USER_AGENT = 'scraper (+http://www.yourdomain.com)'
#to uncomment the line and set the value to the user-agent of your choice.
USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'
Comment anonymously. Login not required.