Downloading images during a crawl keeps each scraped item tied to the exact media that was published with it. That matters for product catalogs, content archives, and training datasets because the files are stored locally instead of being left behind on a remote site that may change or disappear.

Scrapy's ImagesPipeline reads image URLs from the image_urls field, schedules those requests through the normal downloader, stores the files under IMAGES_STORE, and writes per-file results to the images field. The result entries include the original URL, a path relative to IMAGES_STORE, a checksum, and a status such as downloaded. Recent files are not fetched again until they expire under IMAGES_EXPIRES, which defaults to 90 days.

The pipeline requires Pillow 8.3.2 or later and normalizes downloaded images to JPEG in RGB mode, so saved filenames default to full/<sha1>.jpg even when the source file was a PNG or WEBP. Relative src attributes should be converted with response.urljoin(), and IMAGES_STORE must point to a valid writable location or the pipeline stays disabled.

Steps to download images with Scrapy ImagesPipeline:

  1. Open the project's item definitions file.
    $ vi images_demo/items.py
  2. Define item fields for the image URL list and the processed image results.
    import scrapy
     
    class GalleryItem(scrapy.Item):
        title = scrapy.Field()
        image_urls = scrapy.Field()
        images = scrapy.Field()

    Items that declare fields ahead of time need both image_urls and images for ImagesPipeline.

  3. Open the project's settings file.
    $ vi images_demo/settings.py
  4. Enable ImagesPipeline and set a writable IMAGES_STORE path.
    ITEM_PIPELINES = {
        "scrapy.pipelines.images.ImagesPipeline": 1,
    }
     
    IMAGES_STORE = "/srv/scrapy/images-store"

    ImagesPipeline remains disabled if IMAGES_STORE is missing or points to an invalid location.

  5. Create the image storage directory.
    $ mkdir -p /srv/scrapy/images-store
  6. Open the spider file that extracts the page images.
    $ vi images_demo/spiders/gallery.py
  7. Yield each item with an image_urls list built from absolute image URLs.
    import scrapy
     
    from images_demo.items import GalleryItem
     
    class GallerySpider(scrapy.Spider):
        name = "gallery"
        start_urls = ["http://media.example.net/gallery.html"]
     
        def parse(self, response):
            for card in response.css("figure"):
                image_src = card.css("img::attr(src)").get()
                if image_src:
                    yield GalleryItem(
                        title=card.css("figcaption::text").get(),
                        image_urls=[response.urljoin(image_src)],
                    )

    image_urls must be a list even when the item has only one image.

  8. Run the spider and export the items to JSON Lines so the images field is easy to inspect.
    $ scrapy crawl gallery -O gallery.jl
    2026-04-16 05:33:28 [scrapy.utils.log] INFO: Scrapy 2.15.0 started (bot: images_demo)
    ##### snipped #####
    2026-04-16 05:33:30 [scrapy.middleware] INFO: Enabled item pipelines:
    ['scrapy.pipelines.images.ImagesPipeline']
    2026-04-16 05:33:31 [scrapy.pipelines.files] DEBUG: File (downloaded): Downloaded file from <GET http://media.example.net/images/gallery-1.png> referred in <None>
    2026-04-16 05:33:33 [scrapy.pipelines.files] DEBUG: File (downloaded): Downloaded file from <GET http://media.example.net/images/gallery-2.png> referred in <None>
    2026-04-16 05:33:33 [scrapy.extensions.feedexport] INFO: Stored jl feed (2 items) in: gallery.jl
  9. Inspect the exported items to confirm Scrapy populated the images field with the stored path and status.
    $ cat gallery.jl
    {"title": "Gallery Image 1", "image_urls": ["http://media.example.net/images/gallery-1.png"], "images": [{"url": "http://media.example.net/images/gallery-1.png", "path": "full/190189632ca8f84ddd67245247213422e717d64b.jpg", "checksum": "c3d726f9666bb68e8ca97c8f0d61def1", "status": "downloaded"}]}
    {"title": "Gallery Image 2", "image_urls": ["http://media.example.net/images/gallery-2.png"], "images": [{"url": "http://media.example.net/images/gallery-2.png", "path": "full/bc4de106168178ec5616391b97c904c442dd2796.jpg", "checksum": "e2eca853247645378a82c6ffbc5e17ee", "status": "downloaded"}]}
  10. List the image store to confirm the downloaded files exist as local JPEG files under full.
    $ ls /srv/scrapy/images-store/full
    190189632ca8f84ddd67245247213422e717d64b.jpg
    bc4de106168178ec5616391b97c904c442dd2796.jpg

    The path value in the exported item is relative to IMAGES_STORE.