scrapy scraping and web crawling program is a Python-based program and is available in Python Package Index. This means that you can install scrapy on any operating system if you have pip installed.

Some operating systems provide scrapy package specific for their operating system and version thus doesn't require pip for the installation.

These scrapy packages however are normally not as up-to-date as the one distributed by pip though it's been better tested and integrated with the specific operating system and version.

Steps to install scrapy using pip:

  1. Install pip for your operating system if you don't already have them installed.
    $ sudo apt update && sudo apt install --assume-yes python-pip #Ubuntu and Debian 
  2. Install scrapy using pip.
    $ pip install scrapy
    Collecting scrapy
      Downloading https://files.pythonhosted.org/packages/3b/e4/69b87d7827abf03dea2ea984230d50f347b00a7a3897bc93f6ec3dafa494/Scrapy-1.8.0-py2.py3-none-any.whl (238kB)
        100% |████████████████████████████████| 245kB 1.4MB/s
    Requirement already satisfied: six>=1.10.0 in /usr/lib/python2.7/dist-packages (from scrapy) (1.12.0)
    Collecting lxml>=3.5.0 (from scrapy)
      Downloading https://files.pythonhosted.org/packages/e4/f4/65d145cd6917131826050b0479be35aaccba2847b7f80fc4afc6bec6616b/lxml-4.4.1-cp27-cp27mu-manylinux1_x86_64.whl (5.7MB)
        100% |████████████████████████████████| 5.7MB 145kB/s
    Collecting Twisted>=16.0.0; python_version == "2.7" (from scrapy)
      Downloading https://files.pythonhosted.org/packages/18/0f/0df34ad9161861d5b629a54f5fe8941f1ef9b73425923aeac1861fefa94d/Twisted-19.7.0-cp27-cp27mu-manylinux1_x86_64.whl (3.2MB)
        100% |████████████████████████████████| 3.2MB 324kB/s
    Collecting queuelib>=1.4.2 (from scrapy)
      Downloading https://files.pythonhosted.org/packages/4c/85/ae64e9145f39dd6d14f8af3fa809a270ef3729f3b90b3c0cf5aa242ab0d4/queuelib-1.5.0-py2.py3-none-any.whl
    Collecting pyOpenSSL>=16.2.0 (from scrapy)
      Downloading https://files.pythonhosted.org/packages/01/c8/ceb170d81bd3941cbeb9940fc6cc2ef2ca4288d0ca8929ea4db5905d904d/pyOpenSSL-19.0.0-py2.py3-none-any.whl (53kB)
        100% |████████████████████████████████| 61kB 2.0MB/s
    Collecting parsel>=1.5.0 (from scrapy)
      Downloading https://files.pythonhosted.org/packages/86/c8/fc5a2f9376066905dfcca334da2a25842aedfda142c0424722e7c497798b/parsel-1.5.2-py2.py3-none-any.whl
    Collecting service-identity>=16.0.0 (from scrapy)
      Downloading https://files.pythonhosted.org/packages/e9/7c/2195b890023e098f9618d43ebc337d83c8b38d414326685339eb024db2f6/service_identity-18.1.0-py2.py3-none-any.whl
    Collecting w3lib>=1.17.0 (from scrapy)
      Downloading https://files.pythonhosted.org/packages/6a/45/1ba17c50a0bb16bd950c9c2b92ec60d40c8ebda9f3371ae4230c437120b6/w3lib-1.21.0-py2.py3-none-any.whl
    Collecting PyDispatcher>=2.0.5 (from scrapy)
      Downloading https://files.pythonhosted.org/packages/cd/37/39aca520918ce1935bea9c356bcbb7ed7e52ad4e31bff9b943dfc8e7115b/PyDispatcher-2.0.5.tar.gz
    Collecting zope.interface>=4.1.3 (from scrapy)
      Downloading https://files.pythonhosted.org/packages/a2/a2/e68c37eb2ef9bf942e0ace19f4cf6fe3e7c650932fb587bfde3c608f7d77/zope.interface-4.6.0-cp27-cp27mu-manylinux1_x86_64.whl (164kB)
        100% |████████████████████████████████| 174kB 3.8MB/s
    Collecting protego>=0.1.15 (from scrapy)
      Downloading https://files.pythonhosted.org/packages/e8/4b/c72e7d801facc2f519824680b65d76373e6bb289df668dbf8758ea21ff10/Protego-0.1.15.tar.gz (3.2MB)
        100% |████████████████████████████████| 3.2MB 218kB/s
    Requirement already satisfied: cryptography>=2.0 in /usr/lib/python2.7/dist-packages (from scrapy) (2.6.1)
    Collecting cssselect>=0.9.1 (from scrapy)
      Downloading https://files.pythonhosted.org/packages/3b/d4/3b5c17f00cce85b9a1e6f91096e1cc8e8ede2e1be8e96b87ce1ed09e92c5/cssselect-1.1.0-py2.py3-none-any.whl
    Collecting Automat>=0.3.0 (from Twisted>=16.0.0; python_version == "2.7"->scrapy)
      Downloading https://files.pythonhosted.org/packages/e5/11/756922e977bb296a79ccf38e8d45cafee446733157d59bcd751d3aee57f5/Automat-0.8.0-py2.py3-none-any.whl
    Collecting constantly>=15.1 (from Twisted>=16.0.0; python_version == "2.7"->scrapy)
      Downloading https://files.pythonhosted.org/packages/b9/65/48c1909d0c0aeae6c10213340ce682db01b48ea900a7d9fce7a7910ff318/constantly-15.1.0-py2.py3-none-any.whl
    Collecting PyHamcrest>=1.9.0 (from Twisted>=16.0.0; python_version == "2.7"->scrapy)
      Downloading https://files.pythonhosted.org/packages/9a/d5/d37fd731b7d0e91afcc84577edeccf4638b4f9b82f5ffe2f8b62e2ddc609/PyHamcrest-1.9.0-py2.py3-none-any.whl (52kB)
        100% |████████████████████████████████| 61kB 6.9MB/s
    ##### Snipped 
    Building wheels for collected packages: PyDispatcher, protego, functools32
      Running setup.py bdist_wheel for PyDispatcher ... done
      Stored in directory: /home/user/.cache/pip/wheels/88/99/96/cfef6665f9cb1522ee6757ae5955feedf2fe25f1737f91fa7f
      Running setup.py bdist_wheel for protego ... done
      Stored in directory: /home/user/.cache/pip/wheels/72/d1/f2/4e0a2e6d0179c201952b1b3e086a736548605386193cd312f6
      Running setup.py bdist_wheel for functools32 ... done
      Stored in directory: /home/user/.cache/pip/wheels/b5/18/32/77a1030457155606ba5e3ec3a8a57132b1a04b1c4f765177b2
    Successfully built PyDispatcher protego functools32
    Installing collected packages: lxml, attrs, Automat, constantly, PyHamcrest, idna, hyperlink, zope.interface, incremental, Twisted, queuelib, pyOpenSSL, cssselect, functools32, w3lib, parsel, pyasn1, pyasn1-modules, service-identity, PyDispatcher, protego, scrapy
      The script automat-visualize is installed in '/home/user/.local/bin' which is not on PATH.
      Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
      The scripts cftp, ckeygen, conch, mailmail, pyhtmlizer, tkconch, trial, twist and twistd are installed in '/home/user/.local/bin' which is not on PATH.
      Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
      The script scrapy is installed in '/home/user/.local/bin' which is not on PATH.
      Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
    Successfully installed Automat-0.8.0 PyDispatcher-2.0.5 PyHamcrest-1.9.0 Twisted-19.7.0 attrs-19.3.0 constantly-15.1.0 cssselect-1.1.0 functools32-3.2.3.post2 hyperlink-19.0.0 idna-2.8 incremental-17.5.0 lxml-4.4.1 parsel-1.5.2 protego-0.1.15 pyOpenSSL-19.0.0 pyasn1-0.4.7 pyasn1-modules-0.2.7 queuelib-1.5.0 scrapy-1.8.0 service-identity-18.1.0 w3lib-1.21.0 zope.interface-4.6.0

    The installation notes mentions the installation directory which in this case is /home/user/.local/bin and if the directory is within the PATH environment variable which is where your operating system will look for the program if you run the program without specifying the full path.

  3. Run scrapy using full path.
    $ /home/user/.local/bin/scrapy
    Scrapy 1.8.0 - no active project
    
    Usage:
      scrapy <command> [options] [args]
    
    Available commands:
      bench         Run quick benchmark test
      fetch         Fetch a URL using the Scrapy downloader
      genspider     Generate new spider using pre-defined templates
      runspider     Run a self-contained spider (without creating a project)
      settings      Get settings values
      shell         Interactive scraping console
      startproject  Create new project
      version       Print Scrapy version
      view          Open URL in browser, as seen by Scrapy
    
      [ more ]      More commands available when run from project directory
    
    Use "scrapy <command> -h" to see more info about a command
  4. Add pip installation directory to PATH environment variable.
    $ echo PATH=$PATH:/home/user/.local/bin >> ~/.bashrc #Linux
  5. Start a new terminal session and run scrapy again without specifying full path.
    $ bash
    $ scrapy
    Scrapy 1.8.0 - no active project
    
    Usage:
      scrapy <command> [options] [args]
    
    Available commands:
      bench         Run quick benchmark test
      fetch         Fetch a URL using the Scrapy downloader
      genspider     Generate new spider using pre-defined templates
      runspider     Run a self-contained spider (without creating a project)
      settings      Get settings values
      shell         Interactive scraping console
      startproject  Create new project
      version       Print Scrapy version
      view          Open URL in browser, as seen by Scrapy
    
      [ more ]      More commands available when run from project directory
    
    Use "scrapy <command> -h" to see more info about a command
Discuss the article:

Share your thoughts, suggest corrections or just say Hi. Login not required.

Share!