Scraping a JSON API with Scrapy pulls structured records directly from the application data layer instead of rebuilding the same objects from HTML. That keeps the crawl resilient when the useful data already arrives as arrays, objects, and pagination fields.
Current Scrapy releases can start API requests from async def start(), while older project code may still use start_requests(), and parse the response body with response.json() in the callback. A spider can yield item dictionaries from the returned payload and follow the next-page URL or cursor until the endpoint stops advertising more results.
Authenticated APIs often require bearer tokens, CSRF headers, or session cookies in addition to URL parameters. When later requests depend on Scrapy's cookie middleware, current Scrapy docs say to pass cookies through the cookies= argument instead of only sending a raw Cookie header. POST-based APIs should switch to scrapy.http.JsonRequest, and conservative delays help avoid HTTP 429 responses or short-lived bans.
Related: How to scrape a GraphQL API with Scrapy
Related: How to export Scrapy items to JSON
$ scrapy startproject api_scrape
New Scrapy project 'api_scrape', using template directory '##### snipped #####', created in:
/srv/api_scrape
You can start your first spider with:
cd api_scrape
scrapy genspider example example.com
$ cd api_scrape
$ scrapy genspider products api.example.net Created spider 'products' using template 'basic' in module: api_scrape.spiders.products
from os import environ from urllib.parse import urlencode import scrapy class ProductsSpider(scrapy.Spider): name = "products" allowed_domains = ["api.example.net"] api_endpoint = "https://api.example.net/v1/products" per_page = 100 custom_settings = { "DOWNLOAD_DELAY": 1.0, "CONCURRENT_REQUESTS_PER_DOMAIN": 2, "AUTOTHROTTLE_ENABLED": True, "AUTOTHROTTLE_START_DELAY": 1.0, "AUTOTHROTTLE_MAX_DELAY": 10.0, "FEED_EXPORT_ENCODING": "utf-8", } def api_headers(self): headers = {"Accept": "application/json"} token = environ.get("EXAMPLE_API_TOKEN") if token: headers["Authorization"] = f"Bearer {token}" return headers async def start(self): params = {"page": 1, "per_page": self.per_page} yield scrapy.Request( url=f"{self.api_endpoint}?{urlencode(params)}", headers=self.api_headers(), callback=self.parse, ) def parse(self, response): payload = response.json() for row in payload.get("products", []): yield { "id": row.get("id"), "name": row.get("name"), "price": row.get("price"), "currency": row.get("currency"), "url": row.get("url"), } next_url = payload.get("next") if next_url: yield response.follow( next_url, headers=self.api_headers(), callback=self.parse, )
The example expects product records under products and the next-page URL under next. Some APIs return a cursor token instead of a full URL, in which case the next request should rebuild the URL or JSON body from that cursor.
$ scrapy crawl products -O products.json 2026-04-22 07:18:11 [scrapy.utils.log] INFO: Scrapy 2.15.0 started (bot: api_scrape) ##### snipped ##### 2026-04-22 07:18:26 [scrapy.extensions.feedexport] INFO: Stored json feed (4 items) in: products.json 2026-04-22 07:18:26 [scrapy.core.engine] INFO: Spider closed (finished)
-O overwrites any existing products.json and writes one complete JSON array when the crawl finishes cleanly. Use JSON Lines when repeated runs should append or stream items instead of replacing the whole file.
$ python3 -c "import json; print(len(json.load(open('products.json', encoding='utf-8'))))"
4
A numeric count confirms that the exported file closed as valid JSON instead of a truncated partial array.
$ cat products.json
[
{"id": "p-0001", "name": "Starter Plan", "price": 29, "currency": "USD", "url": "https://api.example.net/products/starter-plan"},
{"id": "p-0002", "name": "Team Plan", "price": 79, "currency": "USD", "url": "https://api.example.net/products/team-plan"},
{"id": "p-0003", "name": "Growth Plan", "price": 129, "currency": "USD", "url": "https://api.example.net/products/growth-plan"},
{"id": "p-0004", "name": "Scale Plan", "price": 249, "currency": "USD", "url": "https://api.example.net/products/scale-plan"}
]