How to submit a form in a Scrapy spider

Submitting an HTML form is often the only way to reach search results, filtered listings, and other pages that are returned only after the site receives specific fields. A spider that submits the same controls as the browser can move from the landing page to the response that actually contains the target records.

Scrapy provides FormRequest.from_response() for this workflow. It reads the live <form> element from the current response, keeps the form action, method, and hidden inputs, and lets the spider override only the fields that need new values such as a search term or selected category.

The form selector, field names, and result selectors must match the live page exactly, and some sites add JavaScript-generated values, CAPTCHA checks, or multi-step flows that a plain form request cannot satisfy. When a page has more than one submit button, the clicked control can change the payload, so use clickdata to choose the right button or dont_click=True to submit without the automatic click when needed.

Steps to submit a form in a Scrapy spider:

  1. Inspect the live form in scrapy shell so the spider uses the correct selector, field names, and action URL.
    $ scrapy shell "https://app.internal.example/search" --nolog
    >>> response.css('form#search-form::attr(action)').get()
    '/search'
    >>> response.css('form#search-form::attr(method)').get()
    'post'
    >>> response.css('form#search-form [name]::attr(name)').getall()
    ['csrf_token', 'q', 'category', 'search']
  2. Replace the spider module that should submit the form, such as productsearch/spiders/catalog.py, with a spider that builds the request from the returned form.
    import scrapy
     
     
    class CatalogSpider(scrapy.Spider):
        name = "catalog"
        start_urls = ["https://app.internal.example/search"]
     
        def parse(self, response):
            yield scrapy.FormRequest.from_response(
                response,
                formcss="form#search-form",
                formdata={
                    "q": "laptop",
                    "category": "all",
                },
                callback=self.parse_results,
            )
     
        def parse_results(self, response):
            for product in response.css("article.product"):
                yield {
                    "name": product.css("h2 a::text").get(default="").strip(),
                    "price": product.css(".price::text").get(default="").strip(),
                    "url": response.urljoin(
                        product.css("h2 a::attr(href)").get(default="")
                    ),
                }

    FormRequest.from_response() keeps the hidden inputs from the selected form and overrides only the fields listed in formdata. Use formid, formname, formxpath, or formcss to target the correct form, set clickdata when a specific submit button value must be sent, and pass None as a field value if a pre-filled form field should be excluded from the request.

    Submitting login, checkout, unsubscribe, or other state-changing forms can alter remote data, so start with search or filter forms on a test or read-only target.

  3. Update the formcss selector, formdata keys, and parse_results() selectors so they match the target page's actual HTML.

    The request body is built from the form control name attributes, not from visible labels, placeholder text, or nearby headings.

  4. Run the spider and overwrite the JSON export file with items from the submitted response.
    $ scrapy crawl catalog -O results.json
    2026-04-22 08:14:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://app.internal.example/search>
    {'name': 'Laptop Starter', 'price': '$499', 'url': 'https://app.internal.example/products/starter'}
    2026-04-22 08:14:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://app.internal.example/search>
    {'name': 'Laptop Team', 'price': '$899', 'url': 'https://app.internal.example/products/team'}
    2026-04-22 08:14:21 [scrapy.extensions.feedexport] INFO: Stored json feed (2 items) in: results.json
  5. Open the export file and confirm the items came from the form response.
    $ cat results.json
    [
    {"name": "Laptop Starter", "price": "$499", "url": "https://app.internal.example/products/starter"},
    {"name": "Laptop Team", "price": "$899", "url": "https://app.internal.example/products/team"}
    ]