Creating a Scrapy project gives a crawler a predictable home for spiders, settings, exported items, and middleware changes, which matters once a scrape needs more than one experimental file. A proper project layout also makes it easier to keep crawling logic, output rules, and site-specific settings together as the crawl grows.

The scrapy startproject command creates a top-level working directory with scrapy.cfg plus a Python package that contains settings.py, items.py, pipelines.py, middlewares.py, and the spiders module. After that scaffold exists, project-aware commands such as scrapy settings, scrapy genspider, and scrapy crawl use the new directory as their configuration root.

The project name becomes both the directory name and the Python package name, so it should use letters, numbers, and underscores instead of spaces or hyphens. Scrapy must already be installed before starting, and the generated settings.py enables ROBOTSTXT_OBEY by default, which means new projects respect target-site robots.txt rules until that setting is changed.

Steps to create a Scrapy project:

  1. Change to the directory that will hold the new Scrapy project.
    $ cd /home/user/sg-work
  2. Create the project scaffold with the project name that will become the working directory and Python package.
    $ scrapy startproject catalogbot
    New Scrapy project 'catalogbot', created in:
        /home/user/sg-work/catalogbot
    
    You can start your first spider with:
        cd catalogbot
        scrapy genspider example example.com

    Project names become importable package names, so spaces and hyphens create invalid or awkward module names.

  3. Change into the new project directory before running project-aware Scrapy commands.
    $ cd catalogbot

    Commands such as scrapy settings, scrapy genspider, and scrapy crawl expect to run from the project root where scrapy.cfg exists.

  4. List the generated files to confirm that the project scaffold includes the main package, settings, and spider module.
    $ find . -maxdepth 2 -print
    .
    ./scrapy.cfg
    ./catalogbot
    ./catalogbot/spiders
    ./catalogbot/__init__.py
    ./catalogbot/middlewares.py
    ./catalogbot/settings.py
    ./catalogbot/items.py
    ./catalogbot/pipelines.py

    scrapy.cfg points the CLI at the project settings package, while the inner catalogbot directory holds the code that will be imported during crawls.

  5. Read the configured bot name to confirm that Scrapy is loading the new project's settings.
    $ scrapy settings --get BOT_NAME
    catalogbot

    If this command fails outside the project root, change back to the directory that contains scrapy.cfg.

  6. Read the spider module path to confirm where new spider files will be created.
    $ scrapy settings --get NEWSPIDER_MODULE
    catalogbot.spiders

    scrapy genspider uses this module path when it writes a new spider skeleton. Related: How to create a Scrapy spider

  7. Check the generated robots policy before adding crawl targets or request logic.
    $ scrapy settings --get ROBOTSTXT_OBEY
    True

    New projects enable ROBOTSTXT_OBEY in the generated settings.py file even though Scrapy's historical fallback default is False.

Notes

  • Use a project name that can be imported cleanly in Python, such as catalogbot or price_monitor, because the same name appears in BOT_NAME, NEWSPIDER_MODULE, and SPIDER_MODULES.
  • Keep one project directory per crawler or site family when settings, pipelines, middleware, or item models differ, and use separate spiders inside that project only when they can reasonably share the same configuration base.