Infinite scrolling pages usually expose only the first batch of records in the initial HTML response. Replaying the follow-up requests is the reliable way to collect full product lists, timelines, or search results instead of silently missing everything loaded after the first scroll.

Most sites implement infinite scroll by calling a JSON or HTML fragment endpoint through background Fetch or XHR requests. Scrapy works best when it targets that endpoint directly, reproduces the same request shape, parses the returned payload with response.json() or selectors, and keeps requesting the next batch until the API stops returning a cursor, page number, or offset.

Current Scrapy releases use start() for initial requests, and Request.from_curl() is a practical way to turn a copied browser request into a working spider. If the endpoint depends on short-lived browser tokens, heavy fingerprinting, or rendered DOM state that cannot be reproduced as HTTP requests, move to a browser-rendering workflow instead of forcing a pure Scrapy crawl.

Steps to scrape an infinite scrolling page with Scrapy:

  1. Open the target page in a web browser.
  2. Open the browser developer tools and select the Network tab.
  3. Select the Fetch/XHR filter.
  4. Scroll until the page loads another batch of records.
  5. Select the request that returns the next batch of items.
  6. Inspect the request URL, query string, request body, and headers.

    Look for a page number, offset, cursor, after token, or a JSON body field that changes on each batch.

  7. Inspect the response payload and note the keys that hold the items and the next pagination value.

    Common keys include items, results, entries, next, and next_cursor.

  8. Copy the request via CopyCopy as cURL (bash).
  9. Create a new Scrapy project.
    $ scrapy startproject scrollfeed
    New Scrapy project 'scrollfeed', using template directory '/usr/lib/python3/dist-packages/scrapy/templates/project', created in:
        /home/user/scrollfeed
    
    You can start your first spider with:
        cd /home/user/scrollfeed
        scrapy genspider example example.com
  10. Change into the new project directory.
    $ cd scrollfeed
  11. Generate a spider for the API host used by the scrolling request.
    $ scrapy genspider feed api.example.net
    Created spider 'feed' using template 'basic' in module:
      scrollfeed.spiders.feed
  12. Edit scrollfeed/spiders/feed.py to replay the copied request and keep following the returned cursor.
    scrollfeed/spiders/feed.py
    import scrapy
     
     
    class FeedSpider(scrapy.Spider):
        name = "feed"
        allowed_domains = ["api.example.net"]
        curl_command = "curl 'https://api.example.net/feed?cursor=0' -H 'Accept: application/json' -H 'Referer: https://www.example.net/feed'"
     
        custom_settings = {
            "AUTOTHROTTLE_ENABLED": True,
            "AUTOTHROTTLE_START_DELAY": 0.25,
            "AUTOTHROTTLE_MAX_DELAY": 10.0,
            "DOWNLOAD_DELAY": 0.25,
        }
     
        async def start(self):
            yield scrapy.Request.from_curl(self.curl_command, callback=self.parse_feed)
     
        def parse_feed(self, response):
            payload = response.json()
     
            for entry in payload.get("items", []):
                yield {
                    "id": entry.get("id"),
                    "title": entry.get("title"),
                }
     
            next_cursor = payload.get("next_cursor")
            if next_cursor is None:
                return
     
            yield response.request.replace(
                url=f"https://api.example.net/feed?cursor={next_cursor}",
                callback=self.parse_feed,
            )

    Replace the copied cURL command, the allowed_domains entry, and the items/next_cursor keys to match the real endpoint. If the next token lives in a POST body instead of the URL, keep using response.request.replace() and pass an updated body= value. For spiders that must also run on Scrapy releases older than 2.13, add a matching start_requests() method for compatibility.

  13. Run the spider and export the collected items to JSON.
    $ scrapy crawl feed -O items.json
    2026-04-16 05:32:46 [scrapy.core.engine] INFO: Spider opened
    2026-04-16 05:32:47 [scrapy.extensions.feedexport] INFO: Stored json feed (4 items) in: items.json
    2026-04-16 05:32:47 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
    {'finish_reason': 'finished',
     'item_scraped_count': 4}
    2026-04-16 05:32:47 [scrapy.core.engine] INFO: Spider closed (finished)

    Scrolling endpoints are often rate limited and can still be covered by robots.txt rules, so keep delays reasonable and confirm the target site's crawling policy before scaling the spider up.

  14. Review the exported file to confirm later batches were written to the feed output.
    $ cat items.json
    [
    {"id": 101, "title": "First batch item"},
    {"id": 102, "title": "Second batch item"},
    {"id": 103, "title": "Third batch item"},
    {"id": 104, "title": "Fourth batch item"}
    ]