Scraping tabular data from websites turns human-readable HTML tables into structured rows that can be exported, filtered, and reused in automation workflows. Pulling the table into fields such as names, IDs, or metrics avoids manual copy and reduces the chance of transcription errors.
Scrapy downloads the page content and exposes it through a selector tree for XPath or CSS queries. Most tables follow a predictable structure using <table>, <thead>, <tbody>, <tr>, <th>, and <td>, so scraping usually becomes selecting the correct table element and iterating through its data rows.
Scraping only sees the server-rendered response, so tables populated by JavaScript may not exist in the fetched HTML. Irregular table layouts using colspan/rowspan or row headers in <th> can shift column positions, so extraction should tolerate missing cells and unexpected row shapes.
Related: How to use CSS selectors in Scrapy
Related: How to export Scrapy items to CSV
http://app.internal.example:8000/table/
The example table can be selected by its id value pricing.
<table id="pricing"> <thead> <tr><th>Plan</th><th>Price</th></tr> </thead> <tbody> <tr> <td>Starter Plan</td> <td>$29</td> </tr> <tr> <td>Team Plan</td> <td>$79</td> </tr> <tr> <td>Enterprise Plan</td> <td>$199</td> </tr> </tbody> </table>
$ scrapy shell http://app.internal.example:8000/table/ 2026-01-01 09:13:50 [scrapy.utils.log] INFO: Scrapy 2.11.1 started (bot: simplifiedguide) ##### snipped #####
URL fragments starting with # are ignored by HTTP clients and do not affect the fetched response.
In [1]: response Out[1]: <200 http://app.internal.example:8000/table/>
200 means the page content was fetched successfully.
In [2]: table = response.xpath('//*[@class="table table-striped"]//tbody')
In [3]: table
Out[3]: [<Selector query='//*[@id="pricing"]//tbody' data='<tbody>\n<tr><td>Starter Plan</td><td>...'>]
XPath matching with @id keeps selectors stable across class changes.
In [4]: for row in response.xpath('//*[@id="pricing"]//tbody/tr'):
...: item = {
...: 'plan': row.xpath('td[1]//text()').get(),
...: 'price': row.xpath('td[2]//text()').get(),
...: }
...: print(item)
...:
{'plan': 'Starter Plan', 'price': '$29'}
{'plan': 'Team Plan', 'price': '$79'}
{'plan': 'Enterprise Plan', 'price': '$199'}
The table uses <td> for both columns, so indexing starts at td[1] for the plan name.
Tables using colspan/rowspan can shift cell positions and misalign index-based extraction.
import scrapy class ScrapeTableSpider(scrapy.Spider): name = 'scrape-table' start_urls = ['http://app.internal.example:8000/table/'] def parse(self, response): for row in response.xpath('//*[@id="pricing"]//tbody/tr'): yield { 'plan': row.xpath('normalize-space(td[1])').get(), 'price': row.xpath('normalize-space(td[2])').get(), }
Related: How to create a Scrapy spider
$ scrapy runspider --nolog --output -:json scrape_table.py
[
{"plan": "Starter Plan", "price": "$29"},
{"plan": "Team Plan", "price": "$79"},
{"plan": "Enterprise Plan", "price": "$199"}
]