Many websites ship only a shell of the page in the first HTML response and add the real text, links, or product data after the browser runs JavaScript. A normal Scrapy request sees only that first response, so selectors can return empty lists even though the page looks complete in a browser.

scrapy-playwright plugs Playwright into Scrapy's download handler so selected requests are rendered in a browser before the response reaches your spider callback. That keeps the crawl inside Scrapy's normal request scheduling, selector, and feed export workflow instead of forcing a separate browser script.

Current Scrapy guidance still recommends reproducing the underlying network request first when the page loads data from an API, because that is faster and simpler than rendering a full browser page. When the data only appears after client-side rendering or interaction, scrapy-playwright is the cleanest way to keep that browser step inside the Scrapy project, but it should be used only for the requests that actually need it.

Steps to scrape a JavaScript-rendered page with Scrapy using Playwright:

  1. Open a terminal in the Scrapy project directory.
    $ cd /srv/render_demo

    Run the command from the directory that contains scrapy.cfg so Scrapy loads the correct project settings and spider modules.

  2. Install scrapy-playwright in the project's Python environment.
    $ python3 -m pip install scrapy-playwright
    Collecting scrapy-playwright
    ##### snipped #####
    Successfully installed playwright-1.58.0 scrapy-playwright-0.0.46

    Playwright for Python is installed as a dependency, but the browser binary is installed separately.

  3. Install the Chromium browser that Playwright will launch.
    $ python3 -m playwright install chromium

    Set PLAYWRIGHT_BROWSER_TYPE to firefox or webkit later if a different browser better matches the target site.

  4. Add the Playwright download handler to render_demo/settings.py.
    render_demo/settings.py
    DOWNLOAD_HANDLERS = {
        "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
        "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
    }
     
    TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
    PLAYWRIGHT_BROWSER_TYPE = "chromium"
    ROBOTSTXT_OBEY = True

    TWISTED_REACTOR is already the default in new Scrapy projects since 2.7, but setting it explicitly keeps older project settings aligned with scrapy-playwright.

  5. Replace render_demo/spiders/rendered.py with a spider that waits for the rendered DOM before parsing it.
    render_demo/spiders/rendered.py
    import scrapy
    from scrapy_playwright.page import PageMethod
     
     
    class RenderedSpider(scrapy.Spider):
        name = "rendered"
        start_urls = ["https://app.example.com/rendered-feed/"]
     
        async def start(self):
            for url in self.start_urls:
                yield scrapy.Request(
                    url,
                    meta={
                        "playwright": True,
                        "playwright_page_methods": [
                            PageMethod("wait_for_selector", "#items li"),
                        ],
                    },
                )
     
        def parse(self, response):
            for item in response.css("#items li::text").getall():
                yield {"title": item}

    The playwright meta flag sends only this request through the browser, and wait_for_selector delays the response until the rendered items exist in the DOM.

    async def start() is the current Scrapy entry point for custom start requests, so use it instead of the older start_requests() pattern on current releases.

  6. Run the spider and export the rendered items to JSON.
    $ scrapy crawl rendered -O items.json
    2026-04-16 05:47:15 [scrapy.utils.log] INFO: Scrapy 2.15.0 started (bot: render_demo)
    ##### snipped #####
    2026-04-16 05:47:19 [scrapy.core.scraper] DEBUG: Scraped from <200 https://app.example.com/rendered-feed/>
    {'title': 'Rendered Item 1'}
    {'title': 'Rendered Item 2'}
    {'title': 'Rendered Item 3'}
    2026-04-16 05:47:20 [scrapy.extensions.feedexport] INFO: Stored json feed (3 items) in: items.json

    Browser-rendered requests are slower and heavier than raw HTTP requests, so keep concurrency conservative and only enable Playwright for the pages that actually need rendering.

  7. Open the export to confirm the rendered items were written.
    $ cat items.json
    [
    {"title": "Rendered Item 1"},
    {"title": "Rendered Item 2"},
    {"title": "Rendered Item 3"}
    ]

    The item count and field values should match what appears in the page after JavaScript finishes rendering.

Notes

  • Scrapy's current documentation recommends reproducing the page's network request first when the data is loaded from JSON or another API response, because that avoids browser overhead entirely.
  • Use a stable selector in PageMethod(“wait_for_selector”, …) so Scrapy waits for the element that proves the page is ready instead of sleeping for an arbitrary number of seconds.
  • If the page needs clicks, scrolling, or form input before the target content appears, add more PageMethod actions in playwright_page_methods instead of moving the whole scrape outside Scrapy.
  • If the site reacts badly to the default Scrapy user agent, review the scrapy-playwright header behavior and align the browser and request headers before troubleshooting selectors.