Browser-rendered pages often send only a shell of HTML in the first response, then inject product cards, search results, or prices after JavaScript runs. A normal Scrapy request only sees that first response, so selectors can stay empty even though the page looks complete in a real browser.

Selenium fits into Scrapy cleanly as a downloader middleware for the requests that need a browser. The middleware opens the page in Chrome, waits for a stable selector that proves the DOM is ready, and returns the rendered source as an HtmlResponse so the spider can keep using standard Scrapy CSS or XPath selectors and feed exports.

Current Selenium releases ship with Selenium Manager, which can resolve a compatible driver automatically when needed, but browser rendering is still much slower and heavier than plain HTTP fetching. On current Scrapy releases, use async def start() for custom start requests, keep concurrency low when one browser session is shared, and prefer a specific wait selector over fixed sleeps.

Steps to use Selenium with Scrapy:

  1. Install Selenium in the same Python environment as the Scrapy project.
    $ python3 -m pip install selenium

    Selenium 4.6 and later can use Selenium Manager to locate or download a compatible driver automatically, so a separate chromedriver install step is often unnecessary.

  2. Create a new Scrapy project for the Selenium-enabled spider.
    $ scrapy startproject seleniumdemo
    New Scrapy project 'seleniumdemo', using template directory '/usr/lib/python3/dist-packages/scrapy/templates/project', created in:
        /home/user/seleniumdemo
  3. Change to the project directory.
    $ cd seleniumdemo
  4. Generate a spider skeleton for the browser-rendered site.
    $ scrapy genspider rendered app.example.com
    Created spider 'rendered' using template 'basic' in module:
      seleniumdemo.spiders.rendered
  5. Replace seleniumdemo/middlewares.py with a downloader middleware that opens only Selenium-flagged requests in Chrome and returns the rendered DOM back to Scrapy.
    from __future__ import annotations
     
    from typing import Optional
     
    from scrapy import signals
    from scrapy.http import HtmlResponse, Request
    from selenium import webdriver
    from selenium.common.exceptions import TimeoutException
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.support.ui import WebDriverWait
     
     
    class SeleniumDownloaderMiddleware:
        def __init__(self, wait_seconds: int, driver_arguments: list[str]):
            self.wait_seconds = wait_seconds
            self.driver_arguments = driver_arguments
            self.driver: Optional[webdriver.Chrome] = None
            self.crawler = None
     
        @classmethod
        def from_crawler(cls, crawler):
            middleware = cls(
                wait_seconds=crawler.settings.getint("SELENIUM_WAIT_SECONDS", 10),
                driver_arguments=crawler.settings.getlist(
                    "SELENIUM_DRIVER_ARGUMENTS",
                    ["--headless", "--window-size=1280,900"],
                ),
            )
            middleware.crawler = crawler
            crawler.signals.connect(middleware.spider_opened, signal=signals.spider_opened)
            crawler.signals.connect(middleware.spider_closed, signal=signals.spider_closed)
            return middleware
     
        def spider_opened(self, spider):
            options = Options()
            for argument in self.driver_arguments:
                options.add_argument(argument)
            self.driver = webdriver.Chrome(options=options)
     
        def spider_closed(self, spider, reason):
            if self.driver is not None:
                self.driver.quit()
                self.driver = None
     
        def process_request(self, request: Request):
            if not request.meta.get("selenium"):
                return None
     
            if self.driver is None:
                raise RuntimeError("Selenium WebDriver is not initialized.")
     
            self.driver.get(request.url)
     
            wait_css = request.meta.get("selenium_wait_css")
            if wait_css:
                try:
                    WebDriverWait(self.driver, self.wait_seconds).until(
                        EC.presence_of_element_located((By.CSS_SELECTOR, wait_css))
                    )
                except TimeoutException:
                    if self.crawler is not None and self.crawler.spider is not None:
                        self.crawler.spider.logger.warning(
                            "Timed out waiting for selector: %s",
                            wait_css,
                        )
     
            return HtmlResponse(
                url=self.driver.current_url,
                body=self.driver.page_source.encode("utf-8"),
                encoding="utf-8",
                request=request,
            )

    The custom request.meta["selenium"] flag keeps normal requests on Scrapy's regular downloader path, while selenium_wait_css delays parsing until the target selector exists.

  6. Enable the Selenium middleware and conservative browser settings in seleniumdemo/settings.py.
    BOT_NAME = "seleniumdemo"
     
    SPIDER_MODULES = ["seleniumdemo.spiders"]
    NEWSPIDER_MODULE = "seleniumdemo.spiders"
     
    ROBOTSTXT_OBEY = True
     
    CONCURRENT_REQUESTS = 1
    CONCURRENT_REQUESTS_PER_DOMAIN = 1
    DOWNLOAD_DELAY = 1
     
    DOWNLOADER_MIDDLEWARES = {
        ".".join(
            ["seleniumdemo", "middlewares", "SeleniumDownloaderMiddleware"]
        ): 800,
    }
     
    SELENIUM_WAIT_SECONDS = 10
    SELENIUM_DRIVER_ARGUMENTS = [
        "--headless",
        "--window-size=1280,900",
    ]
     
    FEED_EXPORT_ENCODING = "utf-8"

    This example uses one shared browser session, so higher concurrency can serialize badly or mix page state unless a driver pool is added.

  7. Replace seleniumdemo/spiders/rendered.py with a spider that sends only its start request through Selenium.
    import scrapy
     
     
    class RenderedSpider(scrapy.Spider):
        name = "rendered"
        allowed_domains = ["app.example.com"]
        start_urls = ["https://app.example.com/catalog/"]
     
        async def start(self):
            for url in self.start_urls:
                yield scrapy.Request(
                    url,
                    dont_filter=True,
                    meta={
                        "selenium": True,
                        "selenium_wait_css": "#products li",
                    },
                )
     
        def parse(self, response):
            for name in response.css("#products li::text").getall():
                yield {"name": name}

    Current Scrapy releases use async def start() for custom start requests. If the project must still support releases older than 2.13, add the same request flow in start_requests() as a compatibility path.

  8. Run the spider and export the rendered items to JSON.
    $ scrapy crawl rendered -O items.json
    2026-04-16 06:38:04 [scrapy.utils.log] INFO: Scrapy 2.15.0 started (bot: seleniumdemo)
    ##### snipped #####
    2026-04-16 06:38:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://app.example.com/catalog/>
    {'name': 'Rendered Widget'}
    {'name': 'Async Cable'}
    {'name': 'Headless Adapter'}
    2026-04-16 06:38:14 [scrapy.extensions.feedexport] INFO: Stored json feed (3 items) in: items.json
    2026-04-16 06:38:14 [scrapy.core.engine] INFO: Spider closed (finished)

    If the exported feed stays empty, the wait selector is not matching the post-render DOM or the page data should be scraped from its underlying API instead of a browser.

  9. Print the export file to confirm the JavaScript-rendered items were captured.
    $ python3 -m json.tool items.json
    [
        {
            "name": "Rendered Widget"
        },
        {
            "name": "Async Cable"
        },
        {
            "name": "Headless Adapter"
        }
    ]

Notes

  • Use Selenium only on the requests that truly need a browser. If scrapy shell or the normal response already contains the target data, keep the crawl on plain Scrapy requests instead.
  • Pass only the Selenium-specific meta keys that the next request still needs. Do not copy an entire response.meta dictionary into unrelated follow-up requests.
  • Replace #products li with a selector that appears only after the target page has finished rendering, such as a product card, results table row, or loaded-status element.
  • When the target site can be scraped by replaying its JSON or XHR requests instead of rendering the DOM, that approach remains faster, lighter, and easier to scale than Selenium.