Scraping a direct CSV file with Scrapy keeps the crawl on a row-based source instead of fragile HTML selectors, which works well for catalog exports, reporting downloads, and partner feeds that already expose the records as a file.
Current Scrapy still includes CSVFeedSpider for this pattern. It downloads the CSV as one response, uses the first row as headers unless headers is set explicitly, and calls parse_row() once per data row so the spider can yield normal items for feed export, pipelines, or follow-up requests.
The target URL still needs to return the raw CSV body instead of an HTML landing page, login form, or redirect chain, and large CSV downloads are still loaded into one response before row parsing starts. If the feed uses a different CSV dialect, set delimiter or quotechar explicitly, set headers when the file has no header row, and expect malformed rows with the wrong column count to be skipped after a warning.
Related: How to scrape an XML file with Scrapy
Related: How to export Scrapy items to CSV
$ scrapy shell --nolog https://files.example.net/data/products.csv -c 'response.text.splitlines()[:3]' ['sku,name,price,url', 'starter-001,Starter Plan,$29,https://shop.example.net/products/starter-plan', 'team-001,Team Plan,$79,https://shop.example.net/products/team-plan']
The returned text should be CSV lines from the file itself, not an HTML download page or login response.
$ vi csv_feed_spider.py
from scrapy.spiders import CSVFeedSpider class ProductCsvSpider(CSVFeedSpider): name = "product_csv" start_urls = ["https://files.example.net/data/products.csv"] def parse_row(self, response, row): yield { "sku": row["sku"], "name": row["name"], "price": row["price"], "url": row["url"], }
CSVFeedSpider uses the first row as headers by default. Add headers = [“sku”, “name”, “price”, “url”] when the file has no header row, and add delimiter or quotechar only when the feed does not use the normal comma-plus-double-quote CSV format.
Related: How to create a Scrapy spider
$ scrapy runspider csv_feed_spider.py -O products.jsonl 2026-04-16 05:42:54 [scrapy.utils.log] INFO: Scrapy 2.15.0 started (bot: scrapybot) ##### snipped ##### 2026-04-16 05:42:54 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://files.example.net/data/products.csv> (referer: None) 2026-04-16 05:42:54 [scrapy.extensions.feedexport] INFO: Stored jsonl feed (3 items) in: products.jsonl 2026-04-16 05:42:54 [scrapy.core.engine] INFO: Spider closed (finished)
The .jsonl suffix selects JSON Lines export automatically, and -O replaces any existing products.jsonl file.
Running runspider from inside an existing Scrapy project can pull in that project's settings, middleware, and pipelines instead of a neutral standalone configuration.
$ cat products.jsonl
{"sku": "starter-001", "name": "Starter Plan", "price": "$29", "url": "https://shop.example.net/products/starter-plan"}
{"sku": "team-001", "name": "Team Plan", "price": "$79", "url": "https://shop.example.net/products/team-plan"}
{"sku": "growth-001", "name": "Growth Plan", "price": "$129", "url": "https://shop.example.net/products/growth-plan"}
Each line should contain one parsed item, which keeps the export easy to inspect, diff, or hand to later processing.