HTTP caching in Scrapy avoids re-downloading the same pages during repeated crawl runs, saving bandwidth and shortening development cycles. Cached responses also reduce load on target sites while selectors and parsers are being refined.
Scrapy implements caching through HttpCacheMiddleware, which stores request/response pairs using a cache storage backend (filesystem by default) and uses a cache policy to decide when a cached response is fresh enough to reuse. Requests are matched by fingerprint, so identical requests reuse the stored response instead of reaching the network.
Relative HTTPCACHE_DIR paths are stored under the project data directory (.scrapy), which can grow quickly on large crawls and should be excluded from version control. Cached content can hide site changes and session-dependent pages, especially when HTTPCACHE_EXPIRATION_SECS is set to 0 (never expire), so clear the cache or set a non-zero expiration when validating fresh behavior.
Related: How to use Scrapy shell
Related: How to use CSS selectors in Scrapy
Steps to enable HTTP cache in Scrapy:
- Open the Scrapy project settings file.
$ vi simplifiedguide/settings.py
- Add the HTTP cache settings to the project configuration.
HTTPCACHE_ENABLED = True HTTPCACHE_DIR = "httpcache" HTTPCACHE_EXPIRATION_SECS = 0
HTTPCACHE_POLICY defaults to scrapy.extensions.httpcache.DummyPolicy; set it to scrapy.extensions.httpcache.RFC2616Policy for Cache-Control aware caching, or set request meta dont_cache to True to bypass caching per request.
- Run the spider once to populate the cache.
$ scrapy crawl products ##### snipped ##### 2026-01-01 08:35:19 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'httpcache/miss': 2, 'httpcache/store': 2, 'item_scraped_count': 4} ##### snipped #####Non-zero httpcache/store indicates responses were written to cache.
- List the cache directory to confirm the spider cache tree was created.
$ ls -1 .scrapy/httpcache products
- Inspect the cache subtree to confirm request fingerprints were written.
$ ls -1 .scrapy/httpcache/products 63 6f
- Run the spider again to confirm cached responses are being served.
$ scrapy crawl products ##### snipped ##### 2026-01-01 08:35:40 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'httpcache/hit': 1, 'item_scraped_count': 2} ##### snipped #####Deleting .scrapy/httpcache removes stored responses and forces fresh downloads on the next run (rm -rf .scrapy/httpcache).
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
