Scrapy will identify as itself on the
user-agent when scraping websites. To be exact, it's in the form of
While this is the correct and polite way for website scraping, some website owners will outright block
Scrapy or any other scrapers and one way to overcome this is to change
user-agent. The other reason could be that you want
Scrapy to identify itself to be from your company or if the
user-agent itself is some kind of unique identifier and as a mean of authenticating to the website you're scraping.
Regardless the reason, you can change the
Scrapy by following these steps:
$ vi scrapyproject/settings.py
# Crawl responsibly by identifying yourself (and your website) on the user-agent #USER_AGENT = 'scraper (+http://www.yourdomain.com)'
USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'
You can also apply the
user-agent change only when you run your particular
Scrapy spider or shell with the
-s USER_AGENT option. It supersedes the configuration file.
To manually change the
user-agent of your spider, use the above option followed by your
user-agent of choice such as the following example:
$ scrapy shell --help | grep -A1 set\= --set=NAME=VALUE, -s NAME=VALUE set/override setting (may be repeated) $ scrapy shell -s USER_AGENT='myuseragentname' www.example.com