Submitting an HTML form is often required to reach search results, filtered listings, and content that never appears as a normal link. Many sites expose the real data only after a form submission, even when the results page looks static in a browser.
Web forms send key/value pairs to the form action URL using GET or POST, frequently including hidden inputs and session cookies. Scrapy provides FormRequest to generate a form submission request, and FormRequest.from_response() can build that request directly from the form page response so the correct target, default fields, and cookies are preserved.
Some forms require CSRF tokens, one-time hidden fields, or specific submit button values, and some endpoints are state-changing (profile updates, checkout, unsubscribe). Confirm the submission is safe to automate, respect site limits, and validate selectors and payloads in scrapy shell before running a large crawl.
$ scrapy shell 'http://app.internal.example:8000/search' >>> response.status 200
>>> response.css('form#search-form::attr(action)').get()
'/search'
>>> response.urljoin(response.css('form#search-form::attr(action)').get())
'http://app.internal.example:8000/search'
>>> response.css('form#search-form input[name]::attr(name)').getall()
['csrf_token', 'q', 'category']
>>> response.css('form#search-form input[type="hidden"][name]::attr(name)').getall()
['csrf_token']
>>> response.css('form#search-form input[name="csrf_token"]::attr(value)').get()
'8f3c1b2a0e5d4c9a'
import scrapy
from scrapy.http import FormRequest
class SearchSpider(scrapy.Spider):
name = "search"
start_urls = ["http://app.internal.example:8000/search"]
def parse(self, response):
csrf_token = response.css(
'form#search-form input[name="csrf_token"]::attr(value)'
).get()
formdata = {
"q": "laptop",
"category": "all",
}
if csrf_token:
formdata["csrf_token"] = csrf_token
yield FormRequest.from_response(
response,
formid="search-form",
formdata=formdata,
callback=self.parse_results,
)
def parse_results(self, response):
for product in response.css(".product"):
yield {
"name": product.css(".name::text").get(),
"price": product.css(".price::text").get(),
"url": response.urljoin(product.css("a::attr(href)").get()),
}
formid (or formname / formxpath) prevents submitting the wrong form when multiple forms exist on the same page.
Submitting non-idempotent forms can change server state (account updates, purchases, emails) and should be tested with a safe endpoint or test account.
$ scrapy crawl search -O results.json 2026-01-01 09:43:20 [scrapy.extensions.feedexport] INFO: Stored json feed (2 items) in: results.json
$ head -n 6 results.json
[
{"name": "Laptop Laptop", "price": "$499", "url": "http://app.internal.example:8000/products/starter-plan.html"},
{"name": "Laptop Ultrabook", "price": "$899", "url": "http://app.internal.example:8000/products/team-plan.html"}
]