How to get stock chart data with Scrapy

Stock chart data drives candlestick charts, indicators, alerts, and backtests, so consistent time-series bars keep analytics repeatable and easier to troubleshoot.

Most market data providers expose an API endpoint that returns OHLC (often OHLCV) bars per symbol, interval, and date range. A Scrapy spider can request that endpoint, parse each bar from JSON, and emit one item per bar so the feed exporter writes clean rows into a chart-friendly format such as .csv.

Market data APIs differ in licensing, rate limits, symbol formats, and timestamp conventions (ISO dates vs epoch seconds or milliseconds). Conservative throttling reduces gaps and HTTP 429 responses, while careful field mapping avoids subtle errors like swapped timestamps or adjusted prices that change after splits and dividends.

Steps to get stock chart data with Scrapy:

Fetch a sample OHLC response from the chart endpoint.

$ curl -s 'http://api.example.net:8000/api/stock?symbol=EXMPL&interval=1d&start=2025-12-29&end=2025-12-31' | head -n 20
{
  "symbol": "EXMPL",
  "prices": [
    {
      "date": "2025-12-29",
      "close": 128.4
    }
##### snipped #####

Confirm the bar keys (such as date/close) and timestamp format before mapping fields in the spider.

Create a new Scrapy project for the chart spider.

$ scrapy startproject stock_chart
New Scrapy project 'stock_chart', using template directory '##### snipped #####', created in:
    /root/sg-work/stock_chart

You can start your first spider with:
    cd stock_chart
    scrapy genspider example example.com

Change into the project directory.
```
$ cd stock_chart
```

Generate a spider for the market data host.

$ scrapy genspider chart api.example.net
Created spider 'chart' using template 'basic' in module:
  stock_chart.spiders.chart

Export an API token to the environment when authentication is required.
```
$ export MARKET_API_TOKEN="replace-with-token"
```
Using an environment variable avoids hard-coding credentials in the spider.

Pasting tokens into a shell can leak them through history or logs on shared systems.

Edit the spider to request chart data, emitting OHLC rows as items.

stock_chart/spiders/chart.py

import json
import os
import re
from urllib.parse import urlencode
 
import scrapy
 
 
_DATE_RE = re.compile(r"^\d{4}-\d{2}-\d{2}$")
 
 
class ChartSpider(scrapy.Spider):
    name = "chart"
    allowed_domains = ["api.example.net"]
    base_url = "http://api.example.net:8000/api/stock"
 
    def __init__(
        self,
        symbol="EXMPL",
        interval="1d",
        start="2025-12-29",
        end="2025-12-31",
        *args,
        **kwargs,
    ):
        super().__init__(*args, **kwargs)
        self.symbol = self._clean_symbol(symbol)
        self.interval = self._clean_interval(interval)
        self.start_date = self._clean_date(start, "start")
        self.end_date = self._clean_date(end, "end")
 
    def _clean_symbol(self, symbol):
        cleaned = ("" if symbol is None else str(symbol)).strip().upper()
        if not cleaned:
            raise ValueError("symbol must not be empty")
        return cleaned
 
    def _clean_interval(self, interval):
        cleaned = ("" if interval is None else str(interval)).strip()
        if not cleaned:
            raise ValueError("interval must not be empty")
        return cleaned
 
    def _clean_date(self, value, label):
        cleaned = ("" if value is None else str(value)).strip()
        if not _DATE_RE.match(cleaned):
            raise ValueError(f"{label} must be in YYYY-MM-DD format")
        return cleaned
 
    def start_requests(self):
        params = {
            "interval": self.interval,
            "start": self.start_date,
            "end": self.end_date,
        }
        url = f"{self.base_url}?{urlencode(params)}"
        headers = {
            "Accept": "application/json",
            "User-Agent": "stock_chart (+http://app.internal.example:8000/)",
        }
        api_token = os.getenv("MARKET_API_TOKEN")
        if api_token:
            headers["Authorization"] = f"Bearer {api_token}"
        yield scrapy.Request(url=url, headers=headers, callback=self.parse)
 
    def parse(self, response):
        if response.status >= 400:
            self.logger.error("Chart endpoint returned HTTP %s", response.status)
            return
 
        try:
            payload = json.loads(response.text)
        except json.JSONDecodeError:
            self.logger.error(
                "Non-JSON response from chart endpoint (status=%s)", response.status
            )
            return
 
        prices = payload.get("prices", [])
        if not isinstance(prices, list):
            self.logger.error("Missing or invalid 'prices' array in chart payload")
            return
 
        symbol = payload.get("symbol") or self.symbol
        for bar in prices:
            if not isinstance(bar, dict):
                continue
            yield {
                "symbol": symbol,
                "date": bar.get("date"),
                "close": bar.get("close"),
            }

Spider arguments symbol, interval, start, and end override defaults at runtime without code edits.

Set conservative throttling values in settings.py.
stock_chart/settings.py
```
DOWNLOAD_DELAY = 1.0
CONCURRENT_REQUESTS_PER_DOMAIN = 2
 
AUTOTHROTTLE_ENABLED = True
AUTOTHROTTLE_START_DELAY = 1.0
AUTOTHROTTLE_MAX_DELAY = 10.0
AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
```
Aggressive request rates commonly trigger HTTP 429 throttling or temporary blocks on market data APIs.

Related: How to set a download delay in Scrapy
Related: How to enable AutoThrottle in Scrapy

Run the spider to export chart data to a CSV file.

$ scrapy crawl chart -a symbol=EXMPL -a interval=1d -a start=2025-12-29 -a end=2025-12-31 -O exmpl-close.csv
##### snipped #####
[scrapy.extensions.feedexport] INFO: Stored csv feed (3 items) in: exmpl-close.csv
[scrapy.core.engine] INFO: Closing spider (finished)
{'downloader/request_count': 2, 'item_scraped_count': 3}

Use -O to overwrite an existing output file, or -o to append.

Verify the CSV output contains chart rows.

$ head -n 5 exmpl-close.csv
symbol,date,close
EXMPL,2025-12-29,128.4
EXMPL,2025-12-30,129.9
EXMPL,2025-12-31,131.2
$ wc -l exmpl-close.csv
4 exmpl-close.csv

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.