User agent is a string that browsers use to identify itself to the web server. It is sent on every HTTP request in the request header, and in the case of Scrapy, it identifies as the following;

Scrapy/<version> (+https://scrapy.org)

The web server could respond with the same content for all requests regardless of the provided user agent string, or if configured, could for example decide to return a mobile version of a website instead of a normal ones if the user agent indicates that the request is from a mobile browser.

In some cases however, the web server could outright deny a request altogether and this is especially true for requests from web scraping spiders such as Scrapy.

One way to avoid the issue is for Scrapy to change the user agent string and identify itself as any browser and thankfully Scrapy has the ability to do just that.

Steps to change user agent for Scrapy:

  1. Fetch a website normally using scrapy fetch command.
    $ scrapy fetch https://www.example.com

    Also work with shell or any other options.

  2. Change user agent value of fetch request using the set option.
    $ scrapy fetch https://www.example.com --set=USER_AGENT="custom user agent string"
  3. Edit Scrapy's configuration file using your favorite editor from your Scrapy project folder to change the user agent string for a Scrapy project.
    $ vi scrapyproject/settings.py
  4. Search for the USER_AGENT option.
    # Crawl responsibly by identifying yourself (and your website) on the user-agent
    #USER_AGENT = 'scraper (+http://www.yourdomain.com)'
  5. Uncomment by removing the initial # and set the value to the user-agent of your choice.
    USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'
Discuss the article:

Comment anonymously. Login not required.

Share!