Stock chart data drives candlestick charts, indicators, alerts, and backtests, so consistent time-series bars keep analytics repeatable and easier to troubleshoot.
Most market data providers expose an API endpoint that returns OHLC (often OHLCV) bars per symbol, interval, and date range. A Scrapy spider can request that endpoint, parse each bar from JSON, and emit one item per bar so the feed exporter writes clean rows into a chart-friendly format such as .csv.
Market data APIs differ in licensing, rate limits, symbol formats, and timestamp conventions (ISO dates vs epoch seconds or milliseconds). Conservative throttling reduces gaps and HTTP 429 responses, while careful field mapping avoids subtle errors like swapped timestamps or adjusted prices that change after splits and dividends.
Related: How to scrape a JSON API with Scrapy \ Related: How to export Scrapy items to CSV
$ curl -s 'http://api.example.net:8000/api/stock?symbol=EXMPL&interval=1d&start=2025-12-29&end=2025-12-31' | head -n 20
{
"symbol": "EXMPL",
"prices": [
{
"date": "2025-12-29",
"close": 128.4
}
##### snipped #####
Confirm the bar keys (such as date/close) and timestamp format before mapping fields in the spider.
$ scrapy startproject stock_chart
New Scrapy project 'stock_chart', using template directory '##### snipped #####', created in:
/root/sg-work/stock_chart
You can start your first spider with:
cd stock_chart
scrapy genspider example example.com
$ cd stock_chart
$ scrapy genspider chart api.example.net Created spider 'chart' using template 'basic' in module: stock_chart.spiders.chart
$ export MARKET_API_TOKEN="replace-with-token"
Using an environment variable avoids hard-coding credentials in the spider.
Pasting tokens into a shell can leak them through history or logs on shared systems.
import json import os import re from urllib.parse import urlencode import scrapy _DATE_RE = re.compile(r"^\d{4}-\d{2}-\d{2}$") class ChartSpider(scrapy.Spider): name = "chart" allowed_domains = ["api.example.net"] base_url = "http://api.example.net:8000/api/stock" def __init__( self, symbol="EXMPL", interval="1d", start="2025-12-29", end="2025-12-31", *args, **kwargs, ): super().__init__(*args, **kwargs) self.symbol = self._clean_symbol(symbol) self.interval = self._clean_interval(interval) self.start_date = self._clean_date(start, "start") self.end_date = self._clean_date(end, "end") def _clean_symbol(self, symbol): cleaned = ("" if symbol is None else str(symbol)).strip().upper() if not cleaned: raise ValueError("symbol must not be empty") return cleaned def _clean_interval(self, interval): cleaned = ("" if interval is None else str(interval)).strip() if not cleaned: raise ValueError("interval must not be empty") return cleaned def _clean_date(self, value, label): cleaned = ("" if value is None else str(value)).strip() if not _DATE_RE.match(cleaned): raise ValueError(f"{label} must be in YYYY-MM-DD format") return cleaned def start_requests(self): params = { "interval": self.interval, "start": self.start_date, "end": self.end_date, } url = f"{self.base_url}?{urlencode(params)}" headers = { "Accept": "application/json", "User-Agent": "stock_chart (+http://app.internal.example:8000/)", } api_token = os.getenv("MARKET_API_TOKEN") if api_token: headers["Authorization"] = f"Bearer {api_token}" yield scrapy.Request(url=url, headers=headers, callback=self.parse) def parse(self, response): if response.status >= 400: self.logger.error("Chart endpoint returned HTTP %s", response.status) return try: payload = json.loads(response.text) except json.JSONDecodeError: self.logger.error( "Non-JSON response from chart endpoint (status=%s)", response.status ) return prices = payload.get("prices", []) if not isinstance(prices, list): self.logger.error("Missing or invalid 'prices' array in chart payload") return symbol = payload.get("symbol") or self.symbol for bar in prices: if not isinstance(bar, dict): continue yield { "symbol": symbol, "date": bar.get("date"), "close": bar.get("close"), }
Spider arguments symbol, interval, start, and end override defaults at runtime without code edits.
DOWNLOAD_DELAY = 1.0 CONCURRENT_REQUESTS_PER_DOMAIN = 2 AUTOTHROTTLE_ENABLED = True AUTOTHROTTLE_START_DELAY = 1.0 AUTOTHROTTLE_MAX_DELAY = 10.0 AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
Aggressive request rates commonly trigger HTTP 429 throttling or temporary blocks on market data APIs.
Related: How to set a download delay in Scrapy
Related: How to enable AutoThrottle in Scrapy
$ scrapy crawl chart -a symbol=EXMPL -a interval=1d -a start=2025-12-29 -a end=2025-12-31 -O exmpl-close.csv
##### snipped #####
[scrapy.extensions.feedexport] INFO: Stored csv feed (3 items) in: exmpl-close.csv
[scrapy.core.engine] INFO: Closing spider (finished)
{'downloader/request_count': 2, 'item_scraped_count': 3}
Use -O to overwrite an existing output file, or -o to append.
$ head -n 5 exmpl-close.csv symbol,date,close EXMPL,2025-12-29,128.4 EXMPL,2025-12-30,129.9 EXMPL,2025-12-31,131.2 $ wc -l exmpl-close.csv 4 exmpl-close.csv