How to submit a form in a Scrapy spider

Submitting an HTML form is often required to reach search results, filtered listings, and content that never appears as a normal link. Many sites expose the real data only after a form submission, even when the results page looks static in a browser.

Web forms send key/value pairs to the form action URL using GET or POST, frequently including hidden inputs and session cookies. Scrapy provides FormRequest to generate a form submission request, and FormRequest.from_response() can build that request directly from the form page response so the correct target, default fields, and cookies are preserved.

Some forms require CSRF tokens, one-time hidden fields, or specific submit button values, and some endpoints are state-changing (profile updates, checkout, unsubscribe). Confirm the submission is safe to automate, respect site limits, and validate selectors and payloads in scrapy shell before running a large crawl.

Steps to submit a form in a Scrapy spider:

Start scrapy shell on the page that contains the form.

$ scrapy shell 'http://app.internal.example:8000/search'
>>> response.status
200

Identify the form action URL used for submission.

>>> response.css('form#search-form::attr(action)').get()
'/search'
>>> response.urljoin(response.css('form#search-form::attr(action)').get())
'http://app.internal.example:8000/search'

List the input field names expected by the form.

>>> response.css('form#search-form input[name]::attr(name)').getall()
['csrf_token', 'q', 'category']

Extract the values of required hidden fields such as CSRF tokens.

>>> response.css('form#search-form input[type="hidden"][name]::attr(name)').getall()
['csrf_token']
>>> response.css('form#search-form input[name="csrf_token"]::attr(value)').get()
'8f3c1b2a0e5d4c9a'

Submit the form with FormRequest.from_response() from the spider callback.

import scrapy
from scrapy.http import FormRequest


class SearchSpider(scrapy.Spider):
    name = "search"
    start_urls = ["http://app.internal.example:8000/search"]

    def parse(self, response):
        csrf_token = response.css(
            'form#search-form input[name="csrf_token"]::attr(value)'
        ).get()

        formdata = {
            "q": "laptop",
            "category": "all",
        }
        if csrf_token:
            formdata["csrf_token"] = csrf_token

        yield FormRequest.from_response(
            response,
            formid="search-form",
            formdata=formdata,
            callback=self.parse_results,
        )

    def parse_results(self, response):
        for product in response.css(".product"):
            yield {
                "name": product.css(".name::text").get(),
                "price": product.css(".price::text").get(),
                "url": response.urljoin(product.css("a::attr(href)").get()),
            }

formid (or formname / formxpath) prevents submitting the wrong form when multiple forms exist on the same page.

Submitting non-idempotent forms can change server state (account updates, purchases, emails) and should be tested with a safe endpoint or test account.

Run the spider with feed export enabled.

$ scrapy crawl search -O results.json
2026-01-01 09:43:20 [scrapy.extensions.feedexport] INFO: Stored json feed (2 items) in: results.json

Confirm the export file contains items returned by the form response.

$ head -n 6 results.json
[
{"name": "Laptop Laptop", "price": "$499", "url": "http://app.internal.example:8000/products/starter-plan.html"},
{"name": "Laptop Ultrabook", "price": "$899", "url": "http://app.internal.example:8000/products/team-plan.html"}
]

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.