How to scrape a JavaScript-rendered page with Scrapy using Playwright

Many pages return only a shell of HTML in the first response, then add the real cards, quotes, or product data after the browser runs JavaScript. A normal Scrapy request only sees that first response, so selectors can stay empty even when the page looks complete in a browser.

scrapy-playwright keeps the crawl inside Scrapy by letting selected requests open in a real Playwright browser before the response reaches the spider callback. That means the spider can keep using normal Scrapy selectors, feed exports, and request scheduling instead of moving the whole scrape into a separate browser script.

Current Scrapy guidance still recommends replaying the underlying XHR or JSON request first when the page is really loading data from an API. Use browser rendering when the data only appears in the live DOM or after browser-side events, and on current Scrapy releases define custom start requests in async def start() instead of relying on the older start_requests() pattern.

Steps to scrape a JavaScript-rendered page with Scrapy using Playwright:

Install Scrapy and scrapy-playwright in the Python environment that will run the spider.
```
$ python3 -m pip install scrapy scrapy-playwright
Collecting scrapy
##### snipped #####
Successfully installed scrapy-2.15.0 scrapy-playwright-0.0.46
```
Playwright for Python is installed as a dependency, but the browser binary is installed separately.
Install the Chromium browser that Playwright will launch for rendered requests.
```
$ python3 -m playwright install chromium
```
If Playwright later reports a missing browser executable after a package upgrade, run python3 -m playwright install again so the browser cache matches the installed Python package.

Create a new Scrapy project for the rendered spider.

$ scrapy startproject render_demo
New Scrapy project 'render_demo', using template directory '/usr/local/lib/python3.13/site-packages/scrapy/templates/project', created in:
     /home/user/render_demo

 You can start your first spider with:
     cd render_demo
     scrapy genspider example example.com

Related: How to create a Scrapy project

Change to the project directory.
```
$ cd render_demo
```

Replace render_demo/settings.py with Playwright download-handler settings.

render_demo/settings.py

BOT_NAME = "render_demo"
 
SPIDER_MODULES = ["render_demo.spiders"]
NEWSPIDER_MODULE = "render_demo.spiders"
 
DOWNLOAD_HANDLERS = {
    "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
    "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}
 
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
PLAYWRIGHT_BROWSER_TYPE = "chromium"
ROBOTSTXT_OBEY = True
FEED_EXPORT_ENCODING = "utf-8"

If this is an existing project instead of a new demo project, merge these settings into the current file instead of overwriting unrelated project settings.

Create render_demo/spiders/rendered.py with a spider that waits for the rendered quote cards before parsing them.

render_demo/spiders/rendered.py

import scrapy
from scrapy_playwright.page import PageMethod
 
 
class RenderedSpider(scrapy.Spider):
    name = "rendered"
    allowed_domains = ["quotes.toscrape.com"]
    start_urls = ["https://quotes.toscrape.com/js/"]
 
    async def start(self):
        for url in self.start_urls:
            yield scrapy.Request(
                url,
                callback=self.parse,
                meta={
                    "playwright": True,
                    "playwright_page_methods": [
                        PageMethod("wait_for_selector", ".quote"),
                    ],
                },
            )
 
    def parse(self, response):
        for quote in response.css(".quote .text::text").getall()[:3]:
            yield {"quote": quote}

The playwright meta flag sends only this request through the browser, and PageMethod("wait_for_selector", ".quote") delays parsing until the rendered quote elements exist in the DOM.

The [:3] slice keeps the example export short. Remove it when you want every rendered match from the page.

Run the spider and export the rendered quotes to JSON.

$ scrapy crawl rendered -O items.json
2026-04-22 06:50:30 [scrapy.utils.log] INFO: Scrapy 2.15.0 started (bot: render_demo)
##### snipped #####
2026-04-22 06:51:08 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/js/>
{'quote': '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”'}
{'quote': '“It is our choices, Harry, that show what we truly are, far more than our abilities.”'}
{'quote': '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”'}
2026-04-22 06:51:08 [scrapy.extensions.feedexport] INFO: Stored json feed (3 items) in: items.json

Browser-rendered requests are slower and heavier than plain HTTP requests, so keep Playwright limited to the pages that actually need a live browser and lower concurrency if the target site or crawler host starts failing under load.

Open the export file to confirm the rendered DOM content was written.

$ cat items.json
[
{"quote": "“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”"},
{"quote": "“It is our choices, Harry, that show what we truly are, far more than our abilities.”"},
{"quote": "“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”"}
]

If the export stays empty, the wait selector is wrong, the page needs more browser actions before the target elements appear, or the better fix is to replay the page's underlying network request instead of rendering the full browser page.