Item pipelines make scraped data usable by applying consistent cleaning, validation, and persistence after extraction. Enabling a pipeline is a practical way to trim whitespace, normalize types, drop incomplete records, or store items externally without turning spider code into a maintenance trap.
In Scrapy, each item yielded by a spider is passed through the pipeline chain defined in ITEM_PIPELINES. A pipeline is a Python class implementing process_item, and Scrapy executes enabled pipelines in priority order (lower numbers run earlier) before items are exported to feeds or processed by downstream components.
Ordering matters when one pipeline depends on changes made by another, and failures are immediate: unhandled exceptions can stop the crawl and intentional filtering requires raising DropItem. Keep pipeline logic focused and deterministic, and validate changes with short test runs before long crawls to avoid silently damaging output.
Related: How to export Scrapy items to JSON
Related: How to download files with Scrapy
$ vi catalog_demo/pipelines.py
Replace catalog_demo with the Scrapy project package name.
from itemadapter import ItemAdapter class CleanNamePipeline: def process_item(self, item, spider): adapter = ItemAdapter(item) name = adapter.get("name") if name: adapter["name"] = str(name).strip() return item
ItemAdapter supports both Scrapy Item objects and plain dict items.
ITEM_PIPELINES = { "catalog_demo.pipelines.CleanNamePipeline": 300, }
Lower priority numbers run earlier; add more entries to chain multiple pipelines.
An incorrect module path in ITEM_PIPELINES can prevent the spider from starting due to an ImportError.
$ scrapy crawl catalog -O products.json 2026-01-01 09:38:57 [scrapy.middleware] INFO: Enabled item pipelines: ['catalog_demo.pipelines.CleanNamePipeline']
Add -s CLOSESPIDER_ITEMCOUNT=10 for quick pipeline testing on a small crawl.
$ head -n 4 products.json
[
{"name": "Starter Plan", "price": "$29", "url": "http://app.internal.example:8000/products/starter-plan.html"},
{"name": "Team Plan", "price": "$79", "url": "http://app.internal.example:8000/products/team-plan.html"},
{"name": "Enterprise Plan", "price": "$199", "url": "http://app.internal.example:8000/products/enterprise-plan.html"},
##### snipped #####