Routing Scrapy requests through an HTTP proxy supports controlled egress, geo-specific access, and separation between scraping workloads and the originating network.

Scrapy sends each request through a downloader stack built on Twisted. Proxy use is applied per-request via the proxy value in Request.meta, and Scrapy’s HttpProxyMiddleware translates that metadata into the correct connection behavior (including CONNECT tunneling for HTTPS targets).

Proxies can observe and potentially modify traffic, and shared proxies can add latency or unstable results. Only use proxies that are trusted and permitted for the target site, keep proxy credentials out of source control, and keep crawling behavior polite with reasonable concurrency and delays.

Steps to use an HTTP proxy in Scrapy:

  1. Prepare a proxy URL in scheme://host:port format.
    http://proxy.example.net:8888
    http://username:password@proxy.example.net:8888

    URL-encode reserved characters in usernames or passwords (such as @, :, or /) to avoid parsing errors.

    Most forward proxies use an http:// proxy URL even when requesting HTTPS pages.

  2. Set the proxy configuration values in settings.py.
    settings.py
    PROXY_URL = "http://proxy.example.net:8888"
    PROXY_LIST = []

    Populate PROXY_LIST with multiple proxy URLs to rotate proxies per request.

  3. Add a downloader middleware that sets request.meta['proxy'] from the project settings.
    middlewares.py
    import random
    from typing import List, Optional
     
    from scrapy.crawler import Crawler
    from scrapy.http import Request
     
     
    class ProxyMiddleware:
        """Injects a proxy URL into request.meta when proxying is enabled in settings."""
     
        def __init__(self, proxy_url: Optional[str], proxy_list: List[str]) -> None:
            self._proxy_url = proxy_url.strip() if isinstance(proxy_url, str) else None
            self._proxy_list = [p.strip() for p in proxy_list if isinstance(p, str) and p.strip()]
     
        @classmethod
        def from_crawler(cls, crawler: Crawler) -> "ProxyMiddleware":
            proxy_url = crawler.settings.get("PROXY_URL")
            proxy_list = crawler.settings.getlist("PROXY_LIST", [])
            return cls(proxy_url=proxy_url, proxy_list=proxy_list)
     
        def process_request(self, request: Request, spider) -> None:
            if request.meta.get("dont_proxy"):
                return
     
            if request.meta.get("proxy"):
                return
     
            proxy = self._pick_proxy()
            if not proxy:
                return
     
            request.meta["proxy"] = proxy
     
        def _pick_proxy(self) -> Optional[str]:
            if self._proxy_list:
                return random.choice(self._proxy_list)
     
            return self._proxy_url
  4. Enable the proxy downloader middleware stack in settings.py.
    settings.py
    HTTPPROXY_ENABLED = True
     
    DOWNLOADER_MIDDLEWARES = {
        "scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware": 110,
        "myproject.middlewares.ProxyMiddleware": 100,
    }

    Merge the entries into an existing DOWNLOADER_MIDDLEWARES dictionary instead of overwriting unrelated middleware keys.

    Replace myproject with the Scrapy project module name.

  5. Override the proxy for a single request by setting meta['proxy'] in the spider.
    spiders/example_spider.py
    import scrapy
     
     
    class ExampleSpider(scrapy.Spider):
        name = "example"
        start_urls = ["http://app.internal.example:8000/"]
     
        def start_requests(self):
            for url in self.start_urls:
                yield scrapy.Request(
                    url,
                    meta={"proxy": "http://proxy.example.net:8888"},
                )
  6. Skip proxy injection for a request by setting meta['dont_proxy'] to True.
    spiders/example_spider.py
    import scrapy
     
     
    class ExampleSpider(scrapy.Spider):
        name = "example"
        start_urls = ["http://app.internal.example:8000/"]
     
        def start_requests(self):
            for url in self.start_urls:
                yield scrapy.Request(
                    url,
                    meta={"dont_proxy": True},
                )
  7. Override request headers when a target expects them.
    spiders/example_spider.py
    import scrapy
     
     
    class ExampleSpider(scrapy.Spider):
        name = "example"
        start_urls = ["http://app.internal.example:8000/"]
     
        def start_requests(self):
            for url in self.start_urls:
                yield scrapy.Request(
                    url,
                    headers={
                        "User-Agent": "Mozilla/5.0",
                        "Referer": "http://app.internal.example:8000/",
                    },
                )
  8. Start a Scrapy shell request from the project directory.
    $ scrapy shell -s HTTPCACHE_ENABLED=False "http://app.internal.example:8000/headers"

    Proxies that do not support tunneling may fail on HTTPS targets, often as timeouts or Tunnel connection failed errors.

  9. Fetch the endpoint through the proxy and confirm the proxy adds a Via header.
    >> fetch("http://app.internal.example:8000/headers", meta={"proxy": "http://proxy.example.net:8888"})
    >>> import json
    >>> json.loads(response.text)["headers"]["Via"]
    '1.1 tinyproxy (tinyproxy/1.11.1)'