Many pages return only a shell of HTML in the first response, then add the real cards, quotes, or product data after the browser runs JavaScript. A normal Scrapy request only sees that first response, so selectors can stay empty even when the page looks complete in a browser.
scrapy-playwright keeps the crawl inside Scrapy by letting selected requests open in a real Playwright browser before the response reaches the spider callback. That means the spider can keep using normal Scrapy selectors, feed exports, and request scheduling instead of moving the whole scrape into a separate browser script.
Current Scrapy guidance still recommends replaying the underlying XHR or JSON request first when the page is really loading data from an API. Use browser rendering when the data only appears in the live DOM or after browser-side events, and on current Scrapy releases define custom start requests in async def start() instead of relying on the older start_requests() pattern.
$ python3 -m pip install scrapy scrapy-playwright Collecting scrapy ##### snipped ##### Successfully installed scrapy-2.15.0 scrapy-playwright-0.0.46
Playwright for Python is installed as a dependency, but the browser binary is installed separately.
$ python3 -m playwright install chromium
If Playwright later reports a missing browser executable after a package upgrade, run python3 -m playwright install again so the browser cache matches the installed Python package.
$ scrapy startproject render_demo
New Scrapy project 'render_demo', using template directory '/usr/local/lib/python3.13/site-packages/scrapy/templates/project', created in:
/home/user/render_demo
You can start your first spider with:
cd render_demo
scrapy genspider example example.com
Related: How to create a Scrapy project
$ cd render_demo
BOT_NAME = "render_demo" SPIDER_MODULES = ["render_demo.spiders"] NEWSPIDER_MODULE = "render_demo.spiders" DOWNLOAD_HANDLERS = { "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler", "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler", } TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor" PLAYWRIGHT_BROWSER_TYPE = "chromium" ROBOTSTXT_OBEY = True FEED_EXPORT_ENCODING = "utf-8"
If this is an existing project instead of a new demo project, merge these settings into the current file instead of overwriting unrelated project settings.
import scrapy from scrapy_playwright.page import PageMethod class RenderedSpider(scrapy.Spider): name = "rendered" allowed_domains = ["quotes.toscrape.com"] start_urls = ["https://quotes.toscrape.com/js/"] async def start(self): for url in self.start_urls: yield scrapy.Request( url, callback=self.parse, meta={ "playwright": True, "playwright_page_methods": [ PageMethod("wait_for_selector", ".quote"), ], }, ) def parse(self, response): for quote in response.css(".quote .text::text").getall()[:3]: yield {"quote": quote}
The playwright meta flag sends only this request through the browser, and PageMethod("wait_for_selector", ".quote") delays parsing until the rendered quote elements exist in the DOM.
The [:3] slice keeps the example export short. Remove it when you want every rendered match from the page.
$ scrapy crawl rendered -O items.json
2026-04-22 06:50:30 [scrapy.utils.log] INFO: Scrapy 2.15.0 started (bot: render_demo)
##### snipped #####
2026-04-22 06:51:08 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/js/>
{'quote': '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”'}
{'quote': '“It is our choices, Harry, that show what we truly are, far more than our abilities.”'}
{'quote': '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”'}
2026-04-22 06:51:08 [scrapy.extensions.feedexport] INFO: Stored json feed (3 items) in: items.json
Browser-rendered requests are slower and heavier than plain HTTP requests, so keep Playwright limited to the pages that actually need a live browser and lower concurrency if the target site or crawler host starts failing under load.
$ cat items.json
[
{"quote": "“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”"},
{"quote": "“It is our choices, Harry, that show what we truly are, far more than our abilities.”"},
{"quote": "“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”"}
]
If the export stays empty, the wait selector is wrong, the page needs more browser actions before the target elements appear, or the better fix is to replay the page's underlying network request instead of rendering the full browser page.