Item loaders keep field cleanup close to the item schema so a spider callback can stay focused on selectors and link traversal. That becomes useful as soon as one field needs repeatable trimming, prefix cleanup, or URL normalization across more than one response.
In current Scrapy, ItemLoader collects values added through add_css(), add_xpath(), or add_value() into per-field lists, then applies the output processor when load_item() returns the finished item. Scrapy's loader class extends the underlying itemloaders library, so the supported import path is from scrapy.loader import ItemLoader while processors such as MapCompose and TakeFirst come from itemloaders.processors.
Loaders do not fix weak selectors or missing fields on their own. Keep processors small, use field-specific cleanup only where the field genuinely needs it, and confirm the resulting item on one real response before moving the loader into a wider crawl.
Related: How to use Scrapy shell
Related: How to use CSS selectors in Scrapy
import scrapy from itemloaders.processors import MapCompose, TakeFirst from scrapy.loader import ItemLoader def normalize_label(value: str) -> str: return value.removeprefix("Name:").strip() class ImageLinkItem(scrapy.Item): label = scrapy.Field() href = scrapy.Field() class ImageLinkLoader(ItemLoader): default_input_processor = MapCompose(str.strip) default_output_processor = TakeFirst() label_in = MapCompose(str.strip, normalize_label) class LoaderSpider(scrapy.Spider): name = "loader" custom_settings = { "ROBOTSTXT_OBEY": False, } start_urls = [ ( "https://docs.scrapy.org/en/latest/_static/" "selectors-sample1.html" ), ] def parse(self, response): for link in response.css("#images a"): loader = ImageLinkLoader( item=ImageLinkItem(), selector=link, ) loader.add_css("label", "::text") loader.add_css( "href", "::attr(href)", MapCompose(response.urljoin), ) yield loader.load_item()
label_in overrides the default input processor only for the label field, while the per-call MapCompose(response.urljoin) keeps the extracted href value absolute. Related: How to define item fields in Scrapy
$ scrapy runspider loader_spider.py -O image-links.json
##### snipped #####
2026-04-22 06:44:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://docs.scrapy.org/en/latest/_static/selectors-sample1.html>
{'href': 'http://example.com/image1.html', 'label': 'My image 1'}
2026-04-22 06:44:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://docs.scrapy.org/en/latest/_static/selectors-sample1.html>
{'href': 'http://example.com/image2.html', 'label': 'My image 2'}
##### snipped #####
2026-04-22 06:44:52 [scrapy.extensions.feedexport] INFO: Stored json feed (5 items) in: image-links.json
scrapy runspider is useful for a quick loader test because the Item, loader, and spider can stay in one file until the selectors and processors are stable.
$ cat image-links.json
[
{"label": "My image 1", "href": "http://example.com/image1.html"},
{"label": "My image 2", "href": "http://example.com/image2.html"},
{"label": "My image 3", "href": "http://example.com/image3.html"},
{"label": "My image 4", "href": "http://example.com/image4.html"},
{"label": "My image 5", "href": "http://example.com/image5.html"}
]
TakeFirst() suits single-value fields such as label or href, but repeated fields such as tags or multiple links should keep a list-oriented output processor instead of collapsing to one value.