CSV downloads often contain the same data shown on a web page, but in a format that is already structured and ready for processing. Pulling the .csv file directly avoids brittle HTML selectors and preserves clean columns for analysis, import, or automation.

Scrapy can request the CSV URL like any other endpoint and expose the response as text through response.text. The Python csv module can then parse rows into dictionaries, allowing each row to be emitted as a Scrapy item for storage, pipelines, or additional crawling.

Many CSV endpoints are protected by authentication, anti-bot rate limits, or short-lived download tokens, and some servers return compressed or non-UTF-8 content. Large CSV files are downloaded into memory before parsing, so keep file size in mind and confirm the target site permits automated access.

Steps to scrape a CSV file with Scrapy:

  1. Open Scrapy shell for the CSV download URL.
    $ scrapy shell http://files.example.net:8000/data/products.csv
    2026-01-01 09:03:28 [scrapy.utils.log] INFO: Scrapy 2.11.1 started (bot: simplifiedguide)
    ##### snipped #####

    Replace the URL with the direct .csv download endpoint.

  2. Confirm the CSV URL returns a successful response.
    >>> response
    <200 http://files.example.net:8000/data/products.csv>
    >>> response.headers.get(b"Content-Type")
    b'text/csv'

    3xx responses usually indicate a redirect to a signed download URL, and response.url shows the final location.

  3. Preview the header row with a few records.
    >>> response.text.splitlines()[:4]
    ['name,price,url',
     'Starter Plan,$29,http://app.internal.example:8000/products/starter-plan.html',
     'Team Plan,$79,http://app.internal.example:8000/products/team-plan.html',
     'Enterprise Plan,$199,http://app.internal.example:8000/products/enterprise-plan.html']

    Semicolon-delimited files require a custom delimiter such as delimiter=';'.

  4. Test CSV parsing in the shell using Python's csv.DictReader.
    >>> import csv, io
    >>> reader = csv.DictReader(io.StringIO(response.text.lstrip("\ufeff")))
    >>> next(reader)
    {'name': 'Starter Plan', 'price': '$29', 'url': 'http://app.internal.example:8000/products/starter-plan.html'}

    CSV without a header row requires column names via the fieldnames argument.

  5. Create a stand-alone spider file that yields one item per CSV row.
    scrape_csv.py
    import csv
    import io
     
    import scrapy
     
     
    class ScrapeCsvSpider(scrapy.Spider):
        """Scrape a remote CSV file and emit each row as an item."""
     
        name = "scrape-csv"
        start_urls = ["http://files.example.net:8000/data/products.csv"]
     
        def parse(self, response):
            """Parse the CSV response and yield a dictionary for each row."""
            text = response.text.lstrip("\ufeff")
     
            try:
                dialect = csv.Sniffer().sniff(text[:2048])
            except csv.Error:
                dialect = csv.excel
     
            reader = csv.DictReader(io.StringIO(text), dialect=dialect)
            for row in reader:
                if not any(row.values()):
                    continue
     
                clean_row = {}
                for key, value in row.items():
                    if key is None:
                        continue
                    clean_row[key.strip()] = value.strip() if isinstance(value, str) else value
     
                yield clean_row
  6. Run the spider file using Scrapy.
    $ scrapy runspider --nolog --output -:json scrape_csv.py
    [
    {"name": "Starter Plan", "price": "$29", "url": "http://app.internal.example:8000/products/starter-plan.html"},
    {"name": "Team Plan", "price": "$79", "url": "http://app.internal.example:8000/products/team-plan.html"},
    {"name": "Enterprise Plan", "price": "$199", "url": "http://app.internal.example:8000/products/enterprise-plan.html"}
    ]