How to scrape a CSV file with Scrapy

CSV downloads often contain the same data shown on a web page, but in a format that is already structured and ready for processing. Pulling the .csv file directly avoids brittle HTML selectors and preserves clean columns for analysis, import, or automation.

Scrapy can request the CSV URL like any other endpoint and expose the response as text through response.text. The Python csv module can then parse rows into dictionaries, allowing each row to be emitted as a Scrapy item for storage, pipelines, or additional crawling.

Many CSV endpoints are protected by authentication, anti-bot rate limits, or short-lived download tokens, and some servers return compressed or non-UTF-8 content. Large CSV files are downloaded into memory before parsing, so keep file size in mind and confirm the target site permits automated access.

Steps to scrape a CSV file with Scrapy:

Open Scrapy shell for the CSV download URL.

$ scrapy shell http://files.example.net:8000/data/products.csv
2026-01-01 09:03:28 [scrapy.utils.log] INFO: Scrapy 2.11.1 started (bot: simplifiedguide)
##### snipped #####

Replace the URL with the direct .csv download endpoint.

Confirm the CSV URL returns a successful response.
```
>>> response
<200 http://files.example.net:8000/data/products.csv>
>>> response.headers.get(b"Content-Type")
b'text/csv'
```
3xx responses usually indicate a redirect to a signed download URL, and response.url shows the final location.

Preview the header row with a few records.

>>> response.text.splitlines()[:4]
['name,price,url',
 'Starter Plan,$29,http://app.internal.example:8000/products/starter-plan.html',
 'Team Plan,$79,http://app.internal.example:8000/products/team-plan.html',
 'Enterprise Plan,$199,http://app.internal.example:8000/products/enterprise-plan.html']

Semicolon-delimited files require a custom delimiter such as delimiter=';'.

Test CSV parsing in the shell using Python's csv.DictReader.

>>> import csv, io
>>> reader = csv.DictReader(io.StringIO(response.text.lstrip("\ufeff")))
>>> next(reader)
{'name': 'Starter Plan', 'price': '$29', 'url': 'http://app.internal.example:8000/products/starter-plan.html'}

CSV without a header row requires column names via the fieldnames argument.

Create a stand-alone spider file that yields one item per CSV row.

scrape_csv.py

import csv
import io
 
import scrapy
 
 
class ScrapeCsvSpider(scrapy.Spider):
    """Scrape a remote CSV file and emit each row as an item."""
 
    name = "scrape-csv"
    start_urls = ["http://files.example.net:8000/data/products.csv"]
 
    def parse(self, response):
        """Parse the CSV response and yield a dictionary for each row."""
        text = response.text.lstrip("\ufeff")
 
        try:
            dialect = csv.Sniffer().sniff(text[:2048])
        except csv.Error:
            dialect = csv.excel
 
        reader = csv.DictReader(io.StringIO(text), dialect=dialect)
        for row in reader:
            if not any(row.values()):
                continue
 
            clean_row = {}
            for key, value in row.items():
                if key is None:
                    continue
                clean_row[key.strip()] = value.strip() if isinstance(value, str) else value
 
            yield clean_row

Related: How to create a Scrapy spider

Run the spider file using Scrapy.

$ scrapy runspider --nolog --output -:json scrape_csv.py
[
{"name": "Starter Plan", "price": "$29", "url": "http://app.internal.example:8000/products/starter-plan.html"},
{"name": "Team Plan", "price": "$79", "url": "http://app.internal.example:8000/products/team-plan.html"},
{"name": "Enterprise Plan", "price": "$199", "url": "http://app.internal.example:8000/products/enterprise-plan.html"}
]

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.