How to use an HTTP proxy in Scrapy

Routing Scrapy requests through an HTTP proxy supports controlled egress, geo-specific access, and separation between scraping workloads and the originating network.

Scrapy sends each request through a downloader stack built on Twisted. Proxy use is applied per-request via the proxy value in Request.meta, and Scrapy’s HttpProxyMiddleware translates that metadata into the correct connection behavior (including CONNECT tunneling for HTTPS targets).

Proxies can observe and potentially modify traffic, and shared proxies can add latency or unstable results. Only use proxies that are trusted and permitted for the target site, keep proxy credentials out of source control, and keep crawling behavior polite with reasonable concurrency and delays.

Steps to use an HTTP proxy in Scrapy:

Prepare a proxy URL in scheme://host:port format.
```
http://proxy.example.net:8888
http://username:password@proxy.example.net:8888
```
URL-encode reserved characters in usernames or passwords (such as @, :, or /) to avoid parsing errors.

Most forward proxies use an http:// proxy URL even when requesting HTTPS pages.
Set the proxy configuration values in settings.py.
settings.py
```
PROXY_URL = "http://proxy.example.net:8888"
PROXY_LIST = []
```
Populate PROXY_LIST with multiple proxy URLs to rotate proxies per request.

Add a downloader middleware that sets request.meta['proxy'] from the project settings.

middlewares.py

import random
from typing import List, Optional
 
from scrapy.crawler import Crawler
from scrapy.http import Request
 
 
class ProxyMiddleware:
    """Injects a proxy URL into request.meta when proxying is enabled in settings."""
 
    def __init__(self, proxy_url: Optional[str], proxy_list: List[str]) -> None:
        self._proxy_url = proxy_url.strip() if isinstance(proxy_url, str) else None
        self._proxy_list = [p.strip() for p in proxy_list if isinstance(p, str) and p.strip()]
 
    @classmethod
    def from_crawler(cls, crawler: Crawler) -> "ProxyMiddleware":
        proxy_url = crawler.settings.get("PROXY_URL")
        proxy_list = crawler.settings.getlist("PROXY_LIST", [])
        return cls(proxy_url=proxy_url, proxy_list=proxy_list)
 
    def process_request(self, request: Request, spider) -> None:
        if request.meta.get("dont_proxy"):
            return
 
        if request.meta.get("proxy"):
            return
 
        proxy = self._pick_proxy()
        if not proxy:
            return
 
        request.meta["proxy"] = proxy
 
    def _pick_proxy(self) -> Optional[str]:
        if self._proxy_list:
            return random.choice(self._proxy_list)
 
        return self._proxy_url

Enable the proxy downloader middleware stack in settings.py.
settings.py
```
HTTPPROXY_ENABLED = True
 
DOWNLOADER_MIDDLEWARES = {
    "scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware": 110,
    "myproject.middlewares.ProxyMiddleware": 100,
}
```
Merge the entries into an existing DOWNLOADER_MIDDLEWARES dictionary instead of overwriting unrelated middleware keys.

Replace myproject with the Scrapy project module name.

Override the proxy for a single request by setting meta['proxy'] in the spider.

spiders/example_spider.py

import scrapy
 
 
class ExampleSpider(scrapy.Spider):
    name = "example"
    start_urls = ["http://app.internal.example:8000/"]
 
    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(
                url,
                meta={"proxy": "http://proxy.example.net:8888"},
            )

Skip proxy injection for a request by setting meta['dont_proxy'] to True.

spiders/example_spider.py

import scrapy
 
 
class ExampleSpider(scrapy.Spider):
    name = "example"
    start_urls = ["http://app.internal.example:8000/"]
 
    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(
                url,
                meta={"dont_proxy": True},
            )

Override request headers when a target expects them.

spiders/example_spider.py

import scrapy
 
 
class ExampleSpider(scrapy.Spider):
    name = "example"
    start_urls = ["http://app.internal.example:8000/"]
 
    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(
                url,
                headers={
                    "User-Agent": "Mozilla/5.0",
                    "Referer": "http://app.internal.example:8000/",
                },
            )

Start a Scrapy shell request from the project directory.
```
$ scrapy shell -s HTTPCACHE_ENABLED=False "http://app.internal.example:8000/headers"
```
Proxies that do not support tunneling may fail on HTTPS targets, often as timeouts or Tunnel connection failed errors.

Fetch the endpoint through the proxy and confirm the proxy adds a Via header.

>> fetch("http://app.internal.example:8000/headers", meta={"proxy": "http://proxy.example.net:8888"})
>>> import json
>>> json.loads(response.text)["headers"]["Via"]
'1.1 tinyproxy (tinyproxy/1.11.1)'

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.