Changing the User-Agent header allows a Scrapy spider to look like a regular browser instead of a crawler signature, which helps avoid simplistic blocks and can trigger the same layout or device-specific responses seen by real clients.

A web server receives the user-agent string on every HTTP request and may vary content or apply filters based on it. Scrapy sends a default value (Scrapy/<version> (+https://scrapy.org)) via its UserAgentMiddleware unless a request provides its own User-Agent header.

User-agent spoofing does not bypass more advanced anti-bot controls, and a mismatched header set (for example, a mobile user agent with desktop-only headers) can still be flagged. Keep crawler behavior within site policy, and prefer testing the string with a single request before making it the project default.

Steps to change the user agent for Scrapy spiders:

  1. Fetch an endpoint that echoes the received user agent to confirm Scrapy's default header.
    $ scrapy fetch --nolog http://app.internal.example:8000/headers
    {
      "headers": {
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "en",
        "User-Agent": "Scrapy/2.11.1 (+https://scrapy.org)",
        "Accept-Encoding": "gzip, deflate, br",
        "Host": "app.internal.example:8000"
      }
    }
  2. Choose the browser user agent string to send in requests.
  3. Override the USER_AGENT setting for a single command run using the --set option.
    $ scrapy fetch --nolog http://app.internal.example:8000/headers --set=USER_AGENT="Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148"
    {
      "headers": {
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "en",
        "User-Agent": "Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148",
        "Accept-Encoding": "gzip, deflate, br",
        "Host": "app.internal.example:8000"
      }
    }

    The override applies only to this command invocation.

  4. Open the Scrapy project settings file in a text editor.
    $ vi simplifiedguide/settings.py
  5. Locate the USER_AGENT setting line in settings.py.
    # Crawl responsibly by identifying yourself (and your website) on the user-agent
    #USER_AGENT = "simplifiedguide (+http://app.internal.example)"
  6. Set the USER_AGENT value in settings.py to permanently change the user agent for the project.
    USER_AGENT = 'Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148'

    Invalid Python syntax in settings.py prevents spiders and scheduled jobs from starting.

  7. Fetch the echo endpoint again to verify the project default user agent is in effect.
    $ scrapy fetch --nolog http://app.internal.example:8000/headers
    {
      "headers": {
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "en",
        "User-Agent": "Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148",
        "Accept-Encoding": "gzip, deflate, br",
        "Host": "app.internal.example:8000"
      }
    }