Scrapy shell is the fastest way to test a request and extraction logic before that logic is committed to a spider. It exposes empty selectors, unexpected redirects, missing fields, and URL cleanup problems on one response instead of after a full crawl.

The scrapy shell command downloads one URL and opens an interactive session with the resulting Response object already loaded as response. That makes it practical to test response.css(), response.xpath(), response.urljoin(), and shortcut helpers such as fetch() against the same response object a spider callback receives.

The shell still uses the current project settings when it is started inside a Scrapy project, so custom headers, cookies, middleware, throttling, and other overrides can change what the response looks like. It also works against the downloaded response body rather than a browser-rendered DOM, so JavaScript-heavy pages can still appear empty, redirect debugging sometimes needs --no-redirect or fetch(url, redirect=False), and local files should be opened as ./page.html or ../page.html instead of a bare filename.

Steps to use Scrapy shell:

  1. Start scrapy shell against a predictable page and suppress crawler log noise while testing selectors.
    $ scrapy shell 'https://docs.scrapy.org/en/latest/_static/selectors-sample1.html' --nolog
    [s] Available Scrapy objects:
    [s]   request    <GET https://docs.scrapy.org/en/latest/_static/selectors-sample1.html>
    [s]   response   <200 https://docs.scrapy.org/en/latest/_static/selectors-sample1.html>
    [s] Useful shortcuts:
    [s]   fetch(url[, redirect=True]) Fetch URL and update local objects (by default, redirects are followed)
    ##### snipped #####
    >>>

    Start the shell from the Scrapy project directory when the request should reuse that project's settings, middleware, headers, and cookies.

  2. Confirm the downloaded response before building selectors around it.
    >>> response.css("title::text").get()
    'Example website'

    The interactive prompt can appear as >>> or In [1]: depending on which Python shell backend is available.

  3. Extract repeated attribute values with a CSS selector.
    >>> response.css("a::attr(href)").getall()
    ['image1.html', 'image2.html', 'image3.html', 'image4.html', 'image5.html']

    Scrapy adds the non-standard ::text and ::attr(name) pseudo-elements, so these selectors work in Scrapy but not in normal browser CSS.

  4. Cross-check the same response with XPath when the selector depends on structure rather than CSS classes.
    >>> response.xpath('//a[contains(@href, "image")]/img/@src').getall()
    ['image1_thumb.jpg', 'image2_thumb.jpg', 'image3_thumb.jpg', 'image4_thumb.jpg', 'image5_thumb.jpg']
  5. Fetch another page without leaving the shell when the next selector depends on a follow-up response.
    >>> fetch("https://docs.scrapy.org/en/latest/topics/selectors.html")

    Each fetch() sends a real request, so repeated trials against live sites can trigger rate limits, bans, or unwanted state changes.

  6. Check the new response URL after the fetch completes.
    >>> response.url
    'https://docs.scrapy.org/en/latest/topics/selectors.html'
  7. Move the verified selector into a spider callback once the interactive result is correct.
    import scrapy
     
     
    class ShellSelectorsSpider(scrapy.Spider):
        name = "shell-selectors"
        start_urls = [
            "https://docs.scrapy.org/en/latest/_static/selectors-sample1.html",
        ]
     
        def parse(self, response):
            for link in response.css("a[href*=image]"):
                href = link.css("::attr(href)").get()
                thumb = link.css("img::attr(src)").get()
                yield {
                    "href": response.urljoin(href) if href else None,
                    "thumb": response.urljoin(thumb) if thumb else None,
                }