How to scrape a JavaScript-rendered page with Scrapy using Splash

Many modern websites render key content with JavaScript after the initial HTML loads, which can leave a Scrapy spider with empty selectors and missing fields. Scraping the post-rendered DOM enables extraction of the same text and attributes that appear in a normal browser.

Scrapy processes responses through downloader middleware before the spider parses the HTML. Using Splash routes selected requests through a lightweight browser rendering service and returns the rendered HTML to Scrapy, allowing standard CSS or XPath selectors to work without rewriting parsing logic.

JavaScript rendering is slower and more resource-intensive than fetching raw HTML, so rate limiting and concurrency settings should be conservative and target sites' robots/terms should be respected. Splash is best for pages that render correctly with its engine; sites that rely on modern Chromium features or aggressive bot protection may require a different renderer.

Steps to scrape a JavaScript-rendered page with Scrapy using Splash:

Start the Splash renderer in a local Docker container.

$ docker run --platform linux/amd64 --detach --name splash --network=container:sg-scrapy-verify scrapinghub/splash
f5e2c7b4507492024e71c134d472201ed64ddf3920e5266d7ed83d15180834fa

Verify the Splash API responds on http://localhost:8050.

$ curl -s http://localhost:8050/_ping
{"maxrss": 249331712, "status": "ok"}

Preview the rendered HTML returned by Splash for the target page.

$ curl -s 'http://localhost:8050/render.html?url=http://app.internal.example:8000/scroll/&wait=2&cache=0'
<!DOCTYPE html><html lang="en"><head>
  <meta charset="utf-8">
  <title>Scroll Feed</title>
</head>
<body>
  <h1>Scroll Feed</h1>
  <ul id="items"><li>Scroll Item 1</li><li>Scroll Item 2</li><li>Scroll Item 3</li></ul>
##### snipped #####

Use wait to give client-side scripts time to populate the DOM.

Install scrapy-splash in the Scrapy project's Python environment.

$ python -m pip install scrapy-splash
Collecting scrapy-splash
  Downloading scrapy_splash-0.11.1-py2.py3-none-any.whl.metadata (35 kB)
##### snipped #####
Successfully installed attrs-25.4.0 automat-25.4.16 certifi-2025.11.12 cffi-2.0.0 charset_normalizer-3.4.4 constantly-23.10.4 cryptography-46.0.3 cssselect-1.3.0 defusedxml-0.7.1 filelock-3.20.1 hyperlink-21.0.0 idna-3.11 incremental-24.11.0 itemadapter-0.13.0 itemloaders-1.3.2 jmespath-1.0.1 lxml-6.0.2 packaging-25.0 parsel-1.10.0 protego-0.5.0 pyasn1-0.6.1 pyasn1-modules-0.4.2 pycparser-2.23 pydispatcher-2.0.7 pyopenssl-25.3.0 queuelib-1.8.0 requests-2.32.5 requests-file-3.0.1 scrapy-2.13.4 scrapy-splash-0.11.1 service-identity-24.2.0 six-1.17.0 tldextract-5.3.1 twisted-25.5.0 typing-extensions-4.15.0 urllib3-2.6.2 w3lib-2.3.1 zope-interface-8.1.1

Enable Splash middleware, caching, duplicate filtering in settings.py.

SPLASH_URL = "http://localhost:8050"
 
DOWNLOADER_MIDDLEWARES = {
    "scrapy_splash.SplashCookiesMiddleware": 723,
    "scrapy_splash.SplashMiddleware": 725,
    "scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware": 810,
}
 
SPIDER_MIDDLEWARES = {
    "scrapy_splash.SplashDeduplicateArgsMiddleware": 100,
}
 
DUPEFILTER_CLASS = "scrapy_splash.SplashAwareDupeFilter"
HTTPCACHE_STORAGE = "scrapy_splash.SplashAwareFSCacheStorage"
 
ROBOTSTXT_OBEY = True

Set SPLASH_URL to the reachable Splash address, which is not localhost when Scrapy runs in a separate container.

Create a spider that uses SplashRequest for pages that require rendering.

import scrapy
from scrapy_splash import SplashRequest
 
 
class ScrollJsSpider(scrapy.Spider):
    name = "scroll_js"
    start_urls = ["http://app.internal.example:8000/scroll/"]
 
    def start_requests(self):
        for url in self.start_urls:
            yield SplashRequest(
                url=url,
                callback=self.parse,
                endpoint="render.html",
                args={"wait": 2},
            )
 
    def parse(self, response):
        for entry in response.css("#items li"):
            yield {"title": entry.css("::text").get()}

Keep regular scrapy.Request for non-JavaScript pages to reduce rendering load.

Run the spider with JSON feed export enabled.

$ scrapy crawl scroll_js -O items.json
2026-01-01 12:31:46 [scrapy.utils.log] INFO: Scrapy 2.13.4 started (bot: splash_demo)
2026-01-01 12:31:46 [scrapy.core.engine] INFO: Spider opened
2026-01-01 12:31:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://app.internal.example:8000/scroll/ via http://localhost:8050/render.html> (referer: None)
2026-01-01 12:31:49 [scrapy.core.engine] INFO: Closing spider (finished)
2026-01-01 12:31:49 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/response_status_count/200': 2,
 'item_scraped_count': 3,
 'finish_reason': 'finished'}
##### snipped #####

Rendering every request can overwhelm both the renderer and the target site, so reduce concurrency and add delays when crawling beyond a single page.

Inspect the exported file to confirm items are present.

$ python -m json.tool items.json
[
    {
        "title": "Scroll Item 1"
    },
    {
        "title": "Scroll Item 2"
    },
    {
        "title": "Scroll Item 3"
    }
]

Remove the Splash container when rendering is no longer required.
```
$ docker rm -f splash
splash
```

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.