Stock chart data powers candlestick charts, moving averages, alerts, and backtests, so collecting clean OHLCV bars keeps downstream analysis repeatable and easier to compare between runs.
Most chart data providers expose a JSON endpoint that returns one bar per symbol, interval, and time range. A Scrapy spider can request that endpoint, map each bar into an item, and let the built-in feed exporter write the rows directly to a stable output format such as .csv.
Provider payloads differ in bar field names, adjusted-price rules, auth headers, and timestamp formats, so the request URL, response mapping, and throttle settings should match the current API contract before the crawl is reused in a charting pipeline. Current Scrapy project templates already start with conservative per-domain pacing, but explicit export fields and AutoThrottle settings keep market-data crawls more predictable.
Related: How to scrape a JSON API with Scrapy
Related: How to export Scrapy items to CSV
Steps to get stock chart data with Scrapy:
- Probe the chart endpoint with a short symbol, interval, and date range so the bar keys and timestamp format are clear before the spider is written.
$ curl -s 'https://data.example.net/v1/charts?symbol=MSFT&interval=1d&start=2026-04-01&end=2026-04-03' { "symbol": "MSFT", "interval": "1d", "bars": [ { "date": "2026-04-01", "open": 381.1, "high": 384.5, "low": 380.6, "close": 383.9, "volume": 20311452 }, { "date": "2026-04-02", "open": 384.0, "high": 386.4, "low": 382.3, "close": 385.7, "volume": 18744106 }, { "date": "2026-04-03", "open": 385.9, "high": 388.1, "low": 384.8, "close": 387.6, "volume": 19180234 } ] }Confirm whether the provider uses bars, candles, or another array name, and confirm whether the time key is already an ISO date or still needs conversion from epoch seconds or milliseconds.
- Create a new Scrapy project for the chart spider.
$ scrapy startproject market_chart New Scrapy project 'market_chart', using template directory '##### snipped #####', created in: /srv/market_chart You can start your first spider with: cd market_chart scrapy genspider example example.com - Change into the project directory.
$ cd market_chart
- Generate a spider for the market data host.
$ scrapy genspider chart data.example.net Created spider 'chart' using template 'basic' in module: market_chart.spiders.chart
- Replace the generated spider with a request that reads the chart JSON payload and yields one item per bar.
from datetime import date import os from urllib.parse import urlencode import scrapy class ChartSpider(scrapy.Spider): name = "chart" allowed_domains = [ "data.example.net", ] api_url = ( "https://data.example.net" "/v1/charts" ) def __init__( self, symbol="MSFT", interval="1d", start="2026-04-01", end="2026-04-03", *args, **kwargs, ): super().__init__( *args, **kwargs, ) self.symbol = ( symbol.strip() .upper() ) self.interval = interval.strip() self.start_date = ( self._parse_day( start, "start", ) ) self.end_date = ( self._parse_day( end, "end", ) ) def _parse_day( self, value, label ): try: parsed = date.fromisoformat( str(value) ) except ValueError as exc: raise ValueError( f"{label} must use YYYY-MM-DD" ) from exc return parsed.isoformat() def _headers(self): headers = { "Accept": "application/json" } api_token = os.getenv( "CHART_API_TOKEN" ) if api_token: headers[ "Authorization" ] = f"Bearer {api_token}" return headers async def start(self): params = { "symbol": self.symbol, "interval": self.interval, "start": self.start_date, "end": self.end_date, } query = urlencode(params) url = f"{self.api_url}?{query}" yield scrapy.Request( url=url, headers=self._headers(), callback=self.parse, ) def parse(self, response): payload = response.json() bars = payload.get("bars", []) if not isinstance(bars, list): self.logger.error( "Missing bars list" ) return symbol = ( payload.get("symbol") or self.symbol ) for bar in bars: if not isinstance( bar, dict ): continue yield { "symbol": symbol, "date": bar.get("date"), "open": bar.get("open"), "high": bar.get("high"), "low": bar.get("low"), "close": bar.get("close"), "volume": bar.get("volume"), }
symbol, interval, start, and end remain normal spider arguments at crawl time, while api_url and the bar-field names can be adjusted in the file when a provider uses a different endpoint or schema.
- Update settings.py so the crawl keeps a stable CSV column order and slows down cleanly under latency.
CONCURRENT_REQUESTS_PER_DOMAIN = 1 DOWNLOAD_DELAY = 1 AUTOTHROTTLE_ENABLED = True AUTOTHROTTLE_START_DELAY = 1.0 AUTOTHROTTLE_MAX_DELAY = 10.0 AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0 FEED_EXPORT_FIELDS = [ "symbol", "date", "open", "high", "low", "close", "volume", ] FEED_EXPORT_ENCODING = "utf-8"
Current Scrapy project templates already generate CONCURRENT_REQUESTS_PER_DOMAIN = 1, DOWNLOAD_DELAY = 1, and FEED_EXPORT_ENCODING = "utf-8", but keeping the chart export fields explicit prevents column order from drifting when the first yielded item changes.
Chart APIs often enforce burst limits, so raising per-domain concurrency or dropping the delay usually leads to HTTP 429 responses or partial datasets before a full date range finishes.
Related: How to set a download delay in Scrapy
Related: How to enable AutoThrottle in Scrapy - Export the provider token in the current shell when the chart API requires authenticated requests.
$ export CHART_API_TOKEN="replace-with-token"
The spider reads CHART_API_TOKEN only when it exists, so unauthenticated public endpoints can skip this step.
Pasting tokens into a shared shell, screen recording, or saved terminal transcript can expose credentials outside the crawl.
Related: How to set request headers in Scrapy
- Run the spider with the desired symbol, interval, and date range, writing the bars directly to a CSV file.
$ scrapy crawl chart -a symbol=MSFT -a interval=1d -a start=2026-04-01 -a end=2026-04-03 -O msft-1d.csv 2026-04-16 11:46:56 [scrapy.extensions.feedexport] INFO: Stored csv feed (3 items) in: msft-1d.csv 2026-04-16 11:46:56 [scrapy.core.engine] INFO: Spider closed (finished)
-O overwrites any existing output file, while -o appends when the chosen feed format supports appending.
- Read the saved CSV to confirm the header order and one row per returned bar.
$ cat msft-1d.csv symbol,date,open,high,low,close,volume MSFT,2026-04-01,381.1,384.5,380.6,383.9,20311452 MSFT,2026-04-02,384.0,386.4,382.3,385.7,18744106 MSFT,2026-04-03,385.9,388.1,384.8,387.6,19180234
- Compare the total line count with the number of exported bars when a quick post-run check is needed.
$ wc -l msft-1d.csv 4 msft-1d.csv
One header row plus three data rows means the crawl exported three daily bars.
Notes
- Current Scrapy documentation uses async def start() for initial requests, while older spiders may still use the older start_requests() pattern for compatibility with releases older than 2.13.
- Providers that return Unix timestamps instead of ISO dates should convert those values inside parse() before writing the CSV so charting tools do not guess the wrong timezone or unit.
- Adjusted and unadjusted bars are not interchangeable, so the endpoint path or query string should make that distinction explicit before the data is reused for indicators or backtests.
- Use scrapy.http.JsonRequest instead of Request when the provider expects a JSON request body or a POST-based search payload instead of query-string parameters.
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
