How to use request meta in Scrapy

Request meta keeps request-scoped state attached to a Scrapy request so a later callback can still tell which page, crawl branch, or request setting produced that response. It fits detail-page crawls where a followed request needs context from the listing page without moving that state into spider-level globals.

When a spider yields scrapy.Request() or response.follow() with a meta dictionary, Scrapy exposes the same state in the later callback through response.meta. Current Scrapy documentation also notes that response.meta is propagated across redirects and retries, which makes it useful for values such as a source URL, a trace label, or request-specific component keys like cookiejar and download_timeout.

Keep meta narrow and owned by the new request. Current Scrapy guidance prefers cb_kwargs for values that only need to become callback arguments, and it warns against copying an entire response.meta dictionary into unrelated follow-up requests because Scrapy stores internal keys there, including retry bookkeeping.

Steps to use request meta in Scrapy:

Change to the root of the Scrapy project that contains the spider.
```
$ cd catalogdemo
```

Replace the spider with one that keeps the listing URL in meta and passes callback-only fields through cb_kwargs.

$ vi catalogdemo/spiders/catalog.py

import scrapy
 
 
class CatalogSpider(scrapy.Spider):
    name = "catalog"
    allowed_domains = ["books.toscrape.com"]
    start_urls = ["https://books.toscrape.com/catalogue/page-1.html"]
 
    def parse(self, response):
        for book in response.css("article.product_pod")[:2]:
            detail_href = book.css("h3 a::attr(href)").get()
            source_title = book.css("h3 a::attr(title)").get(default="").strip()
            list_price = book.css("p.price_color::text").get(default="").strip()
 
            if detail_href:
                yield response.follow(
                    detail_href,
                    callback=self.parse_detail,
                    cb_kwargs={
                        "source_title": source_title,
                        "list_price": list_price,
                    },
                    meta={"source_url": response.url},
                )
 
    def parse_detail(self, response, source_title, list_price):
        yield {
            "source_title": source_title,
            "list_price": list_price,
            "source_url": response.meta["source_url"],
            "detail_title": response.css("div.product_main h1::text").get(default="").strip(),
            "upc": response.css("table tr:nth-child(1) td::text").get(default="").strip(),
            "detail_url": response.url,
        }

source_url stays on the request through meta, while source_title and list_price stay in cb_kwargs because only the callback needs them.

Copy only the keys that the next request owns when another callback still needs the same request state.
```
yield response.follow(
    next_href,
    callback=self.parse_more,
    meta={"source_url": response.meta["source_url"]},
    cb_kwargs={"source_title": source_title},
)
```
Do not pass meta=response.meta in ordinary spider callbacks, because Scrapy components store internal keys there that should not leak into unrelated follow-up requests.

Run the spider and overwrite the previous JSON export.

$ scrapy crawl catalog -O meta-items.json
2026-04-22 06:39:10 [scrapy.utils.log] INFO: Scrapy 2.15.0 started (bot: catalogdemo)
2026-04-22 06:39:13 [scrapy.core.engine] INFO: Spider opened
##### snipped #####
2026-04-22 06:39:17 [scrapy.extensions.feedexport] INFO: Stored json feed (2 items) in: meta-items.json
2026-04-22 06:39:17 [scrapy.core.engine] INFO: Spider closed (finished)

-O replaces the previous export file so the verification step only shows the current crawl.

Inspect the exported feed to confirm the listing URL from meta survived into the detail callback.

$ python3 -m json.tool meta-items.json
[
    {
        "source_title": "Tipping the Velvet",
        "list_price": "£53.74",
        "source_url": "https://books.toscrape.com/catalogue/page-1.html",
        "detail_title": "Tipping the Velvet",
        "upc": "90fa61229261140a",
        "detail_url": "https://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html"
    },
##### snipped #####
]

If source_url is missing while source_title and list_price still appear, the callback kept its cb_kwargs but the request did not keep the expected meta key.