Downloading images during a crawl keeps each scraped item tied to the exact media that was published with it. That matters for product catalogs, content archives, and training datasets because the files are stored locally instead of being left behind on a remote site that may change or disappear.
Scrapy's ImagesPipeline reads image URLs from the image_urls field, schedules those requests through the normal downloader, stores the files under IMAGES_STORE, and writes per-file results to the images field. The result entries include the original URL, a path relative to IMAGES_STORE, a checksum, and a status such as downloaded. Recent files are not fetched again until they expire under IMAGES_EXPIRES, which defaults to 90 days.
The pipeline requires Pillow 8.3.2 or later and normalizes downloaded images to JPEG in RGB mode, so saved filenames default to full/<sha1>.jpg even when the source file was a PNG or WEBP. Relative src attributes should be converted with response.urljoin(), and IMAGES_STORE must point to a valid writable location or the pipeline stays disabled.
Related: How to download files with Scrapy
Related: How to enable item pipelines in Scrapy
Steps to download images with Scrapy ImagesPipeline:
- Open the project's item definitions file.
$ vi images_demo/items.py
- Define item fields for the image URL list and the processed image results.
import scrapy class GalleryItem(scrapy.Item): title = scrapy.Field() image_urls = scrapy.Field() images = scrapy.Field()
Items that declare fields ahead of time need both image_urls and images for ImagesPipeline.
- Open the project's settings file.
$ vi images_demo/settings.py
- Enable ImagesPipeline and set a writable IMAGES_STORE path.
ITEM_PIPELINES = { "scrapy.pipelines.images.ImagesPipeline": 1, } IMAGES_STORE = "/srv/scrapy/images-store"
ImagesPipeline remains disabled if IMAGES_STORE is missing or points to an invalid location.
- Create the image storage directory.
$ mkdir -p /srv/scrapy/images-store
- Open the spider file that extracts the page images.
$ vi images_demo/spiders/gallery.py
- Yield each item with an image_urls list built from absolute image URLs.
import scrapy from images_demo.items import GalleryItem class GallerySpider(scrapy.Spider): name = "gallery" start_urls = ["http://media.example.net/gallery.html"] def parse(self, response): for card in response.css("figure"): image_src = card.css("img::attr(src)").get() if image_src: yield GalleryItem( title=card.css("figcaption::text").get(), image_urls=[response.urljoin(image_src)], )
image_urls must be a list even when the item has only one image.
- Run the spider and export the items to JSON Lines so the images field is easy to inspect.
$ scrapy crawl gallery -O gallery.jl 2026-04-16 05:33:28 [scrapy.utils.log] INFO: Scrapy 2.15.0 started (bot: images_demo) ##### snipped ##### 2026-04-16 05:33:30 [scrapy.middleware] INFO: Enabled item pipelines: ['scrapy.pipelines.images.ImagesPipeline'] 2026-04-16 05:33:31 [scrapy.pipelines.files] DEBUG: File (downloaded): Downloaded file from <GET http://media.example.net/images/gallery-1.png> referred in <None> 2026-04-16 05:33:33 [scrapy.pipelines.files] DEBUG: File (downloaded): Downloaded file from <GET http://media.example.net/images/gallery-2.png> referred in <None> 2026-04-16 05:33:33 [scrapy.extensions.feedexport] INFO: Stored jl feed (2 items) in: gallery.jl
- Inspect the exported items to confirm Scrapy populated the images field with the stored path and status.
$ cat gallery.jl {"title": "Gallery Image 1", "image_urls": ["http://media.example.net/images/gallery-1.png"], "images": [{"url": "http://media.example.net/images/gallery-1.png", "path": "full/190189632ca8f84ddd67245247213422e717d64b.jpg", "checksum": "c3d726f9666bb68e8ca97c8f0d61def1", "status": "downloaded"}]} {"title": "Gallery Image 2", "image_urls": ["http://media.example.net/images/gallery-2.png"], "images": [{"url": "http://media.example.net/images/gallery-2.png", "path": "full/bc4de106168178ec5616391b97c904c442dd2796.jpg", "checksum": "e2eca853247645378a82c6ffbc5e17ee", "status": "downloaded"}]} - List the image store to confirm the downloaded files exist as local JPEG files under full.
$ ls /srv/scrapy/images-store/full 190189632ca8f84ddd67245247213422e717d64b.jpg bc4de106168178ec5616391b97c904c442dd2796.jpg
The path value in the exported item is relative to IMAGES_STORE.
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
