Scrapy is a Python-based scraping and web crawling program and is available in Python Package Index. This means that you can install Scrapy on any operating system if you have pip installed.

Some operating systems provide Scrapy package specific for their operating system and version thus doesn't require pip for the installation.

These Scrapy packages however are normally not as up-to-date as the one distributed by pip though it's been better tested and integrated with the specific operating system and version.

Steps to install Scrapy using pip:

  1. Install pip for your operating system if you don't already have them installed.
  2. Install scrapy package using pip.
    $ pip3 install scrapy
    Collecting scrapy
      Downloading Scrapy-2.1.0-py2.py3-none-any.whl (239 kB)
         |████████████████████████████████| 239 kB 585 kB/s
    Requirement already satisfied: queuelib>=1.4.2 in /usr/lib/python3/dist-packages (from scrapy) (1.5.0)
    Requirement already satisfied: cssselect>=0.9.1 in /usr/lib/python3/dist-packages (from scrapy) (1.1.0)
    Requirement already satisfied: cryptography>=2.0 in /usr/lib/python3/dist-packages (from scrapy) (2.8)
    Requirement already satisfied: parsel>=1.5.0 in /usr/lib/python3/dist-packages (from scrapy) (1.5.2)
    Requirement already satisfied: zope.interface>=4.1.3 in /usr/lib/python3/dist-packages (from scrapy) (4.7.1)
    Requirement already satisfied: lxml>=3.5.0 in /usr/lib/python3/dist-packages (from scrapy) (4.5.0)
    Requirement already satisfied: Twisted>=17.9.0 in /usr/lib/python3/dist-packages (from scrapy) (18.9.0)
    Collecting protego>=0.1.15
      Downloading Protego-0.1.16.tar.gz (3.2 MB)
         |████████████████████████████████| 3.2 MB 1.7 MB/s
    Requirement already satisfied: w3lib>=1.17.0 in /usr/lib/python3/dist-packages (from scrapy) (1.21.0)
    Requirement already satisfied: pyOpenSSL>=16.2.0 in /usr/lib/python3/dist-packages (from scrapy) (19.0.0)
    Requirement already satisfied: PyDispatcher>=2.0.5 in /usr/lib/python3/dist-packages (from scrapy) (2.0.5)
    Requirement already satisfied: service-identity>=16.0.0 in /usr/lib/python3/dist-packages (from scrapy) (18.1.0)
    Requirement already satisfied: six in /usr/lib/python3/dist-packages (from protego>=0.1.15->scrapy) (1.14.0)
    Building wheels for collected packages: protego
      Building wheel for protego (setup.py) ... done
      Created wheel for protego: filename=Protego-0.1.16-py3-none-any.whl size=7765 sha256=626b937f054376178cf6ae215e4b07133deb9ce861478a8a3a2682428cfb8a4e
      Stored in directory: /home/user/.cache/pip/wheels/91/64/36/bd0d11306cb22a78c7f53d603c7eb74ebb6c211703bc40b686
    Successfully built protego
    Installing collected packages: protego, scrapy
      WARNING: The script scrapy is installed in '/home/user/.local/bin' which is not on PATH.
      Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
    Successfully installed protego-0.1.16 scrapy-2.1.0

    The installation notes mentions the installation directory which in this case is /home/user/.local/bin and if the directory is within the PATH environment variable which is where your operating system will look for the program if you run the program without specifying the full path.

  3. Run scrapy using full path.
    $ /home/user/.local/bin/scrapy
    Scrapy 2.1.0 - no active project
    
    Usage:
      scrapy <command> [options] [args]
    
    Available commands:
      bench         Run quick benchmark test
      fetch         Fetch a URL using the Scrapy downloader
      genspider     Generate new spider using pre-defined templates
      runspider     Run a self-contained spider (without creating a project)
      settings      Get settings values
      shell         Interactive scraping console
      startproject  Create new project
      version       Print Scrapy version
      view          Open URL in browser, as seen by Scrapy
    
      [ more ]      More commands available when run from project directory
    
    Use "scrapy <command> -h" to see more info about a command
  4. Add pip installation directory to PATH environment variable.
    $ echo PATH=$PATH:/home/user/.local/bin >> ~/.bashrc #Linux
  5. Start a new terminal session and run scrapy again without specifying full path.
    $ bash
    $ scrapy
    Scrapy 2.1.0 - no active project
    
    Usage:
      scrapy <command> [options] [args]
    
    Available commands:
      bench         Run quick benchmark test
      fetch         Fetch a URL using the Scrapy downloader
      genspider     Generate new spider using pre-defined templates
      runspider     Run a self-contained spider (without creating a project)
      settings      Get settings values
      shell         Interactive scraping console
      startproject  Create new project
      version       Print Scrapy version
      view          Open URL in browser, as seen by Scrapy
    
      [ more ]      More commands available when run from project directory
    
    Use "scrapy <command> -h" to see more info about a command
Discuss the article:

Comment anonymously. Login not required.

Share!