Scraping an HTML table with Scrapy turns rows that are easy to read in a browser into structured items that can be exported, filtered, or reused in later crawls. That works well for price lists, inventory pages, schedules, and other pages that publish data as a table instead of an API response.
Scrapy exposes the downloaded response through XPath and CSS selectors, which makes it practical to test one table selector in scrapy shell before the same extraction is moved into a spider. The cleanest flow is to identify one stable <table> element, confirm the header and row selectors, and only then map each cell into item fields.
The approach only works when the server response already contains the table markup. JavaScript-rendered tables, login-protected pages, anti-bot responses, and layouts that rely on rowspan, colspan, or irregular missing cells usually need a different endpoint or more explicit field handling than a simple fixed-column loop.
Related: How to use CSS selectors in Scrapy
Related: How to export Scrapy items to CSV

Prefer an explicit id such as pricing when the table provides one, because it keeps the selector shorter and less likely to drift than a long class or ancestor match.
$ scrapy shell 'https://catalog.example.com/pricing/' --nolog [s] Available Scrapy objects: [s] response <200 https://catalog.example.com/pricing/> [s] Useful shortcuts: [s] fetch(url[, redirect=True]) Fetch URL and update local objects (by default, redirects are followed) ##### snipped ##### >>>
Related: How to use Scrapy shell
>>> response.xpath('//table[@id="pricing"]/thead/tr/th/text()').getall()
['Plan', 'Price']
A quick header check catches the wrong table or the wrong selector before the row loop is written. Related: How to use XPath selectors in Scrapy
>>> rows = response.xpath('//table[@id="pricing"]/tbody/tr')
>>> len(rows)
3
Selecting tbody/tr keeps the loop focused on data rows instead of mixing the header row into the item output.
>>> for row in rows:
... print({
... "plan": row.xpath('normalize-space(./td[1])').get(),
... "price": row.xpath('normalize-space(./td[2])').get(),
... })
...
{'plan': 'Starter Plan', 'price': '$29'}
{'plan': 'Team Plan', 'price': '$79'}
{'plan': 'Enterprise Plan', 'price': '$199'}
Keep the field XPath relative to the current row. Starting a nested selector with // or / jumps back to the document root and can duplicate or misalign values.
import scrapy class PricingTableSpider(scrapy.Spider): name = "pricing_table" start_urls = ["https://catalog.example.com/pricing/"] def parse(self, response): for row in response.xpath('//table[@id="pricing"]/tbody/tr'): yield { "plan": row.xpath("normalize-space(./td[1])").get(), "price": row.xpath("normalize-space(./td[2])").get(), }
Related: How to create a Scrapy spider
$ scrapy runspider table_spider.py -O table_rows.json [scrapy.extensions.feedexport] INFO: Stored json feed (3 items) in: table_rows.json [scrapy.core.engine] INFO: Spider closed (finished)
runspider works outside a full Scrapy project, while -O replaces the existing output file with the current crawl result.
$ cat table_rows.json
[
{"plan": "Starter Plan", "price": "$29"},
{"plan": "Team Plan", "price": "$79"},
{"plan": "Enterprise Plan", "price": "$199"}
]