Stock chart data drives candlestick charts, indicators, alerts, and backtests, so consistent time-series bars keep analytics repeatable and easier to troubleshoot.
Most market data providers expose an API endpoint that returns OHLC (often OHLCV) bars per symbol, interval, and date range. A Scrapy spider can request that endpoint, parse each bar from JSON, and emit one item per bar so the feed exporter writes clean rows into a chart-friendly format such as .csv.
Market data APIs differ in licensing, rate limits, symbol formats, and timestamp conventions (ISO dates vs epoch seconds or milliseconds). Conservative throttling reduces gaps and HTTP 429 responses, while careful field mapping avoids subtle errors like swapped timestamps or adjusted prices that change after splits and dividends.
Related: How to scrape a JSON API with Scrapy \ Related: How to export Scrapy items to CSV
Steps to get stock chart data with Scrapy:
- Fetch a sample OHLC response from the chart endpoint.
$ curl -s 'http://api.example.net:8000/api/stock?symbol=EXMPL&interval=1d&start=2025-12-29&end=2025-12-31' | head -n 20 { "symbol": "EXMPL", "prices": [ { "date": "2025-12-29", "close": 128.4 } ##### snipped #####Confirm the bar keys (such as date/close) and timestamp format before mapping fields in the spider.
- Create a new Scrapy project for the chart spider.
$ scrapy startproject stock_chart New Scrapy project 'stock_chart', using template directory '##### snipped #####', created in: /root/sg-work/stock_chart You can start your first spider with: cd stock_chart scrapy genspider example example.com - Change into the project directory.
$ cd stock_chart
- Generate a spider for the market data host.
$ scrapy genspider chart api.example.net Created spider 'chart' using template 'basic' in module: stock_chart.spiders.chart
- Export an API token to the environment when authentication is required.
$ export MARKET_API_TOKEN="replace-with-token"
Using an environment variable avoids hard-coding credentials in the spider.
Pasting tokens into a shell can leak them through history or logs on shared systems.
- Edit the spider to request chart data, emitting OHLC rows as items.
- stock_chart/spiders/chart.py
import json import os import re from urllib.parse import urlencode import scrapy _DATE_RE = re.compile(r"^\d{4}-\d{2}-\d{2}$") class ChartSpider(scrapy.Spider): name = "chart" allowed_domains = ["api.example.net"] base_url = "http://api.example.net:8000/api/stock" def __init__( self, symbol="EXMPL", interval="1d", start="2025-12-29", end="2025-12-31", *args, **kwargs, ): super().__init__(*args, **kwargs) self.symbol = self._clean_symbol(symbol) self.interval = self._clean_interval(interval) self.start_date = self._clean_date(start, "start") self.end_date = self._clean_date(end, "end") def _clean_symbol(self, symbol): cleaned = ("" if symbol is None else str(symbol)).strip().upper() if not cleaned: raise ValueError("symbol must not be empty") return cleaned def _clean_interval(self, interval): cleaned = ("" if interval is None else str(interval)).strip() if not cleaned: raise ValueError("interval must not be empty") return cleaned def _clean_date(self, value, label): cleaned = ("" if value is None else str(value)).strip() if not _DATE_RE.match(cleaned): raise ValueError(f"{label} must be in YYYY-MM-DD format") return cleaned def start_requests(self): params = { "interval": self.interval, "start": self.start_date, "end": self.end_date, } url = f"{self.base_url}?{urlencode(params)}" headers = { "Accept": "application/json", "User-Agent": "stock_chart (+http://app.internal.example:8000/)", } api_token = os.getenv("MARKET_API_TOKEN") if api_token: headers["Authorization"] = f"Bearer {api_token}" yield scrapy.Request(url=url, headers=headers, callback=self.parse) def parse(self, response): if response.status >= 400: self.logger.error("Chart endpoint returned HTTP %s", response.status) return try: payload = json.loads(response.text) except json.JSONDecodeError: self.logger.error( "Non-JSON response from chart endpoint (status=%s)", response.status ) return prices = payload.get("prices", []) if not isinstance(prices, list): self.logger.error("Missing or invalid 'prices' array in chart payload") return symbol = payload.get("symbol") or self.symbol for bar in prices: if not isinstance(bar, dict): continue yield { "symbol": symbol, "date": bar.get("date"), "close": bar.get("close"), }
Spider arguments symbol, interval, start, and end override defaults at runtime without code edits.
- Set conservative throttling values in settings.py.
- stock_chart/settings.py
DOWNLOAD_DELAY = 1.0 CONCURRENT_REQUESTS_PER_DOMAIN = 2 AUTOTHROTTLE_ENABLED = True AUTOTHROTTLE_START_DELAY = 1.0 AUTOTHROTTLE_MAX_DELAY = 10.0 AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
Aggressive request rates commonly trigger HTTP 429 throttling or temporary blocks on market data APIs.
Related: How to set a download delay in Scrapy
Related: How to enable AutoThrottle in Scrapy - Run the spider to export chart data to a CSV file.
$ scrapy crawl chart -a symbol=EXMPL -a interval=1d -a start=2025-12-29 -a end=2025-12-31 -O exmpl-close.csv ##### snipped ##### [scrapy.extensions.feedexport] INFO: Stored csv feed (3 items) in: exmpl-close.csv [scrapy.core.engine] INFO: Closing spider (finished) {'downloader/request_count': 2, 'item_scraped_count': 3}Use -O to overwrite an existing output file, or -o to append.
- Verify the CSV output contains chart rows.
$ head -n 5 exmpl-close.csv symbol,date,close EXMPL,2025-12-29,128.4 EXMPL,2025-12-30,129.9 EXMPL,2025-12-31,131.2 $ wc -l exmpl-close.csv 4 exmpl-close.csv
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
