Cookies preserve session state across requests, enabling access to pages that require authentication, region selection, or other server-side preferences. Supplying the right cookies can turn a blocked crawl into a predictable, repeatable scrape.
Scrapy sends cookies through its built-in CookiesMiddleware, which builds a standard Cookie header from each Request and stores any Set-Cookie values returned by the server. Cookies can be provided per request using the cookies= argument, while the cookiejar mechanism keeps multiple independent sessions separated when scraping more than one account in a single run.
Session cookies frequently expire, and logging or committing them can expose access to private data. Cookie scope also matters: some sites require a specific Domain or Path attribute, which may need the extended cookie format instead of a simple dictionary.
Steps to use cookies in Scrapy:
- Open the spider file used for session-protected pages.
$ vi simplifiedguide/spiders/account.py
- Export the session cookie value as an environment variable.
$ export SCRAPY_SESSIONID='abc123'
Real session cookies grant account access; avoid committing them to version control or exposing them via shell history, logs, or crash reports.
- Create the authenticated request in the spider using the cookie value from the environment.
import os import scrapy class AccountSpider(scrapy.Spider): name = "account" start_urls = ["http://app.internal.example:8000/account"] def start_requests(self): session_id = os.environ.get("SCRAPY_SESSIONID") if not session_id: raise RuntimeError("SCRAPY_SESSIONID is not set") for url in self.start_urls: yield scrapy.Request( url=url, cookies={"sessionid": session_id}, meta={"cookiejar": 1}, callback=self.parse_account, ) def parse_account(self, response): yield { "account_name": response.css("h1::text").get(), "url": response.url, }
Use the list-of-dictionaries cookie format when Domain or Path must be set: cookies=[{"name":"sessionid","value":"abc123","domain":".example.com","path":"/"}].
- Run the spider with COOKIES_DEBUG enabled to confirm the Cookie header is sent.
$ scrapy crawl account -O account.json -s COOKIES_DEBUG=True -s LOG_LEVEL=DEBUG -s HTTPCACHE_ENABLED=False 2026-01-01 08:48:49 [scrapy.downloadermiddlewares.cookies] DEBUG: Sending cookies to: <GET http://app.internal.example:8000/account> Cookie: sessionid=abc123 2026-01-01 08:48:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://app.internal.example:8000/account> (referer: None) 2026-01-01 08:48:49 [scrapy.extensions.feedexport] INFO: Stored json feed (1 items) in: account.json
COOKIES_DEBUG can print sensitive cookie values into logs; disable it after validation.
- Inspect the exported data to confirm the protected content is present.
$ python3 -m json.tool account.json [ { "account_name": "Example Account", "url": "http://app.internal.example:8000/account" } ]
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
