Submitting an HTML form is often required to reach search results, filtered listings, and content that never appears as a normal link. Many sites expose the real data only after a form submission, even when the results page looks static in a browser.

Web forms send key/value pairs to the form action URL using GET or POST, frequently including hidden inputs and session cookies. Scrapy provides FormRequest to generate a form submission request, and FormRequest.from_response() can build that request directly from the form page response so the correct target, default fields, and cookies are preserved.

Some forms require CSRF tokens, one-time hidden fields, or specific submit button values, and some endpoints are state-changing (profile updates, checkout, unsubscribe). Confirm the submission is safe to automate, respect site limits, and validate selectors and payloads in scrapy shell before running a large crawl.

Steps to submit a form in a Scrapy spider:

  1. Start scrapy shell on the page that contains the form.
    $ scrapy shell 'http://app.internal.example:8000/search'
    >>> response.status
    200
  2. Identify the form action URL used for submission.
    >>> response.css('form#search-form::attr(action)').get()
    '/search'
    >>> response.urljoin(response.css('form#search-form::attr(action)').get())
    'http://app.internal.example:8000/search'
  3. List the input field names expected by the form.
    >>> response.css('form#search-form input[name]::attr(name)').getall()
    ['csrf_token', 'q', 'category']
  4. Extract the values of required hidden fields such as CSRF tokens.
    >>> response.css('form#search-form input[type="hidden"][name]::attr(name)').getall()
    ['csrf_token']
    >>> response.css('form#search-form input[name="csrf_token"]::attr(value)').get()
    '8f3c1b2a0e5d4c9a'
  5. Submit the form with FormRequest.from_response() from the spider callback.
    import scrapy
    from scrapy.http import FormRequest
    
    
    class SearchSpider(scrapy.Spider):
        name = "search"
        start_urls = ["http://app.internal.example:8000/search"]
    
        def parse(self, response):
            csrf_token = response.css(
                'form#search-form input[name="csrf_token"]::attr(value)'
            ).get()
    
            formdata = {
                "q": "laptop",
                "category": "all",
            }
            if csrf_token:
                formdata["csrf_token"] = csrf_token
    
            yield FormRequest.from_response(
                response,
                formid="search-form",
                formdata=formdata,
                callback=self.parse_results,
            )
    
        def parse_results(self, response):
            for product in response.css(".product"):
                yield {
                    "name": product.css(".name::text").get(),
                    "price": product.css(".price::text").get(),
                    "url": response.urljoin(product.css("a::attr(href)").get()),
                }

    formid (or formname / formxpath) prevents submitting the wrong form when multiple forms exist on the same page.

    Submitting non-idempotent forms can change server state (account updates, purchases, emails) and should be tested with a safe endpoint or test account.

  6. Run the spider with feed export enabled.
    $ scrapy crawl search -O results.json
    2026-01-01 09:43:20 [scrapy.extensions.feedexport] INFO: Stored json feed (2 items) in: results.json
  7. Confirm the export file contains items returned by the form response.
    $ head -n 6 results.json
    [
    {"name": "Laptop Laptop", "price": "$499", "url": "http://app.internal.example:8000/products/starter-plan.html"},
    {"name": "Laptop Ultrabook", "price": "$899", "url": "http://app.internal.example:8000/products/team-plan.html"}
    ]