Item pipelines let a Scrapy project clean, validate, enrich, or drop items after a spider yields them and before feed exports or storage backends write the result. Moving that logic out of spider callbacks keeps the extraction code focused on collecting values while one reusable pipeline class handles the post-processing rules.
Scrapy enables pipelines through the ITEM_PIPELINES setting in settings.py. Each active class runs in sequence from the lowest integer value to the highest one, and a basic pipeline can implement process_item(self, item) to update the item or raise DropItem when the record should stop there.
Current scrapy startproject output still leaves the ITEM_PIPELINES block commented out in settings.py, and some generated pipeline stubs still require a spider argument in process_item(). Current Scrapy treats that required spider argument as deprecated, so update the stub before enabling it and then confirm the crawl log shows the pipeline under Enabled item pipelines.
Related: How to export Scrapy items to JSON
Related: How to download files with Scrapy
$ vi catalogdemo/pipelines.py
Replace catalogdemo with the Scrapy project package name that sits next to scrapy.cfg.
from itemadapter import ItemAdapter from scrapy.exceptions import DropItem class CleanNamePipeline: def process_item(self, item): adapter = ItemAdapter(item) cleaned_name = str(adapter.get("name", "")).strip() if not cleaned_name: raise DropItem("Missing name") adapter["name"] = cleaned_name return item
ItemAdapter keeps the same pipeline working with Scrapy Item objects and plain Python dict items.
ITEM_PIPELINES = { "catalogdemo.pipelines.CleanNamePipeline": 300, }
Lower numbers run earlier, so 300 is a common place for a first cleanup or validation stage.
If the dotted path does not match the real module and class name, the crawl stops before any items reach the pipeline.
$ scrapy crawl catalog -O products.jl ##### snipped ##### 2026-04-22 05:51:14 [scrapy.middleware] INFO: Enabled item pipelines: ['catalogdemo.pipelines.CleanNamePipeline'] 2026-04-22 05:51:14 [scrapy.core.scraper] WARNING: Dropped: Missing name 2026-04-22 05:51:14 [scrapy.extensions.feedexport] INFO: Stored jl feed (2 items) in: products.jl
The JSON Lines export makes it easy to see which items survived the pipeline because each accepted item is written as one line.
$ cat products.jl
{"name": "Starter Plan"}
{"name": "Team Plan"}
If an expected item is missing, check the crawl log for the DropItem reason raised by the pipeline.