Submitting an HTML form is often required to reach search results, filtered listings, and content that never appears as a normal link. Many sites expose the real data only after a form submission, even when the results page looks static in a browser.
Web forms send key/value pairs to the form action URL using GET or POST, frequently including hidden inputs and session cookies. Scrapy provides FormRequest to generate a form submission request, and FormRequest.from_response() can build that request directly from the form page response so the correct target, default fields, and cookies are preserved.
Some forms require CSRF tokens, one-time hidden fields, or specific submit button values, and some endpoints are state-changing (profile updates, checkout, unsubscribe). Confirm the submission is safe to automate, respect site limits, and validate selectors and payloads in scrapy shell before running a large crawl.
Steps to submit a form in a Scrapy spider:
- Start scrapy shell on the page that contains the form.
$ scrapy shell 'http://app.internal.example:8000/search' >>> response.status 200
- Identify the form action URL used for submission.
>>> response.css('form#search-form::attr(action)').get() '/search' >>> response.urljoin(response.css('form#search-form::attr(action)').get()) 'http://app.internal.example:8000/search' - List the input field names expected by the form.
>>> response.css('form#search-form input[name]::attr(name)').getall() ['csrf_token', 'q', 'category'] - Extract the values of required hidden fields such as CSRF tokens.
>>> response.css('form#search-form input[type="hidden"][name]::attr(name)').getall() ['csrf_token'] >>> response.css('form#search-form input[name="csrf_token"]::attr(value)').get() '8f3c1b2a0e5d4c9a' - Submit the form with FormRequest.from_response() from the spider callback.
import scrapy from scrapy.http import FormRequest class SearchSpider(scrapy.Spider): name = "search" start_urls = ["http://app.internal.example:8000/search"] def parse(self, response): csrf_token = response.css( 'form#search-form input[name="csrf_token"]::attr(value)' ).get() formdata = { "q": "laptop", "category": "all", } if csrf_token: formdata["csrf_token"] = csrf_token yield FormRequest.from_response( response, formid="search-form", formdata=formdata, callback=self.parse_results, ) def parse_results(self, response): for product in response.css(".product"): yield { "name": product.css(".name::text").get(), "price": product.css(".price::text").get(), "url": response.urljoin(product.css("a::attr(href)").get()), }formid (or formname / formxpath) prevents submitting the wrong form when multiple forms exist on the same page.
Submitting non-idempotent forms can change server state (account updates, purchases, emails) and should be tested with a safe endpoint or test account.
- Run the spider with feed export enabled.
$ scrapy crawl search -O results.json 2026-01-01 09:43:20 [scrapy.extensions.feedexport] INFO: Stored json feed (2 items) in: results.json
- Confirm the export file contains items returned by the form response.
$ head -n 6 results.json [ {"name": "Laptop Laptop", "price": "$499", "url": "http://app.internal.example:8000/products/starter-plan.html"}, {"name": "Laptop Ultrabook", "price": "$899", "url": "http://app.internal.example:8000/products/team-plan.html"} ]
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
