How to get stock chart data with Scrapy

Stock chart data gives Scrapy a clean way to collect OHLCV bars for candlestick charts, indicators, and backtests without scraping rendered tables. Exporting one row per bar keeps the result easy to reload into Python, spreadsheets, or charting tools.

Most market-data providers expose chart history through a JSON endpoint keyed by symbol, interval, and date range. A Scrapy spider can request that endpoint, read the returned bar array, and send the rows straight to a .csv feed so the crawl produces a reusable time-series file in one run.

Providers differ in auth headers, array names, adjusted-price rules, and whether timestamps arrive as ISO dates or Unix time, so the request URL and parse logic should match the current API contract before you reuse the crawl. Current Scrapy releases use async def start() for initial requests, while projects that still target releases older than 2.13 should also keep a start_requests() version for compatibility.

Steps to get stock chart data with Scrapy:

  1. Probe the chart endpoint with one symbol and a short date range.
    $ curl -s 'https://data.example.net/v1/charts?symbol=MSFT&interval=1d&start=2026-04-01&end=2026-04-03'
    {
      "symbol": "MSFT",
      "interval": "1d",
      "bars": [
        {
          "date": "2026-04-01",
          "open": 381.1,
          "high": 384.5,
          "low": 380.6,
          "close": 383.9,
          "volume": 20311452
        },
        {
          "date": "2026-04-02",
          "open": 384.0,
          "high": 386.4,
          "low": 382.3,
          "close": 385.7,
          "volume": 18744106
        },
        {
          "date": "2026-04-03",
          "open": 385.9,
          "high": 388.1,
          "low": 384.8,
          "close": 387.6,
          "volume": 19180234
        }
      ]
    }

    Confirm whether the response uses bars, candles, or another array name, whether the date is already text or a Unix timestamp, and whether the endpoint serves adjusted or raw prices.

  2. Create a new Scrapy project for the chart spider.
    $ scrapy startproject market_chart
    New Scrapy project 'market_chart', using template directory '##### snipped #####', created in:
        /srv/market_chart
    
    You can start your first spider with:
        cd market_chart
        scrapy genspider example example.com
  3. Change into the project directory.
    $ cd market_chart
  4. Generate a spider for the market data host.
    $ scrapy genspider chart data.example.net
    Created spider 'chart' using template 'basic' in module:
      market_chart.spiders.chart
  5. Replace the generated spider with a request that reads the chart JSON and yields one item per bar.
    market_chart/spiders/chart.py
    from datetime import date
    import os
    from urllib.parse import urlencode
     
    import scrapy
     
     
    class ChartSpider(scrapy.Spider):
        name = "chart"
        allowed_domains = ["data.example.net"]
        api_url = "https://data.example.net/v1/charts"
     
        def __init__(
            self,
            symbol="MSFT",
            interval="1d",
            start="2026-04-01",
            end="2026-04-03",
            *args,
            **kwargs,
        ):
            super().__init__(*args, **kwargs)
            self.symbol = symbol.strip().upper()
            self.interval = interval.strip()
            self.start_date = self._parse_day(start, "start")
            self.end_date = self._parse_day(end, "end")
     
        def _parse_day(self, value, label):
            try:
                return date.fromisoformat(str(value)).isoformat()
            except ValueError as exc:
                raise ValueError(f"{label} must use YYYY-MM-DD") from exc
     
        def _headers(self):
            headers = {"Accept": "application/json"}
            api_token = os.getenv("CHART_API_TOKEN")
            if api_token:
                headers["Authorization"] = f"Bearer {api_token}"
            return headers
     
        async def start(self):
            params = urlencode(
                {
                    "symbol": self.symbol,
                    "interval": self.interval,
                    "start": self.start_date,
                    "end": self.end_date,
                }
            )
            yield scrapy.Request(
                url=f"{self.api_url}?{params}",
                headers=self._headers(),
                callback=self.parse,
            )
     
        def parse(self, response):
            payload = response.json()
            bars = payload.get("bars")
            if not isinstance(bars, list):
                self.logger.error("Missing bars list")
                return
     
            symbol = payload.get("symbol") or self.symbol
            for bar in bars:
                if not isinstance(bar, dict):
                    continue
     
                yield {
                    "symbol": symbol,
                    "date": bar.get("date"),
                    "open": bar.get("open"),
                    "high": bar.get("high"),
                    "low": bar.get("low"),
                    "close": bar.get("close"),
                    "volume": bar.get("volume"),
                }

    symbol, interval, start, and end stay as normal spider arguments, so you can switch symbols or date windows at crawl time without editing the file. Convert Unix timestamps inside parse() before yielding the item when the provider does not already return an ISO-style date.

  6. Update the project settings so the CSV header stays ordered and the crawl backs off under latency.
    market_chart/settings.py
    CONCURRENT_REQUESTS_PER_DOMAIN = 1
    DOWNLOAD_DELAY = 1.0
     
    AUTOTHROTTLE_ENABLED = True
    AUTOTHROTTLE_START_DELAY = 1.0
    AUTOTHROTTLE_MAX_DELAY = 10.0
    AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
     
    FEED_EXPORT_FIELDS = [
        "symbol",
        "date",
        "open",
        "high",
        "low",
        "close",
        "volume",
    ]
    FEED_EXPORT_ENCODING = "utf-8"

    Market-data endpoints often enforce burst limits, so raising concurrency or removing the delay usually leads to HTTP 429 responses, shortened date ranges, or both.

  7. Export a bearer token when the provider requires authenticated requests.
    $ export CHART_API_TOKEN="replace-with-token"

    Skip this step for public endpoints, and avoid leaving real tokens in saved shell transcripts or screenshots.

  8. Run the spider with the symbol, interval, and date range you want to export.
    $ scrapy crawl chart -a symbol=MSFT -a interval=1d -a start=2026-04-01 -a end=2026-04-03 -O msft-1d.csv
    ##### snipped #####
    2026-04-22 05:49:16 [scrapy.extensions.feedexport] INFO: Stored csv feed (3 items) in: msft-1d.csv
    2026-04-22 05:49:16 [scrapy.core.engine] INFO: Spider closed (finished)

    -O overwrites the output file, while -o appends only when the chosen feed format supports appending.

  9. Read the saved CSV to confirm the header order and one row for each returned bar.
    $ cat msft-1d.csv
    symbol,date,open,high,low,close,volume
    MSFT,2026-04-01,381.1,384.5,380.6,383.9,20311452
    MSFT,2026-04-02,384.0,386.4,382.3,385.7,18744106
    MSFT,2026-04-03,385.9,388.1,384.8,387.6,19180234