Scrapy
is a Python
-based scraping and web crawling program and is available in Python Package Index. This means that you can install Scrapy
on any operating system if you have pip
installed.
Some operating systems provide Scrapy
package specific for their operating system and version thus doesn't require pip
for the installation.
These Scrapy
packages however are normally not as up-to-date as the one distributed by pip
though it's been better tested and integrated with the specific operating system and version.
pip
for your operating system if you don't already have them installed. scrapy
package using pip
. $ pip3 install scrapy Collecting scrapy Downloading Scrapy-2.1.0-py2.py3-none-any.whl (239 kB) |████████████████████████████████| 239 kB 585 kB/s Requirement already satisfied: queuelib>=1.4.2 in /usr/lib/python3/dist-packages (from scrapy) (1.5.0) Requirement already satisfied: cssselect>=0.9.1 in /usr/lib/python3/dist-packages (from scrapy) (1.1.0) Requirement already satisfied: cryptography>=2.0 in /usr/lib/python3/dist-packages (from scrapy) (2.8) Requirement already satisfied: parsel>=1.5.0 in /usr/lib/python3/dist-packages (from scrapy) (1.5.2) Requirement already satisfied: zope.interface>=4.1.3 in /usr/lib/python3/dist-packages (from scrapy) (4.7.1) Requirement already satisfied: lxml>=3.5.0 in /usr/lib/python3/dist-packages (from scrapy) (4.5.0) Requirement already satisfied: Twisted>=17.9.0 in /usr/lib/python3/dist-packages (from scrapy) (18.9.0) Collecting protego>=0.1.15 Downloading Protego-0.1.16.tar.gz (3.2 MB) |████████████████████████████████| 3.2 MB 1.7 MB/s Requirement already satisfied: w3lib>=1.17.0 in /usr/lib/python3/dist-packages (from scrapy) (1.21.0) Requirement already satisfied: pyOpenSSL>=16.2.0 in /usr/lib/python3/dist-packages (from scrapy) (19.0.0) Requirement already satisfied: PyDispatcher>=2.0.5 in /usr/lib/python3/dist-packages (from scrapy) (2.0.5) Requirement already satisfied: service-identity>=16.0.0 in /usr/lib/python3/dist-packages (from scrapy) (18.1.0) Requirement already satisfied: six in /usr/lib/python3/dist-packages (from protego>=0.1.15->scrapy) (1.14.0) Building wheels for collected packages: protego Building wheel for protego (setup.py) ... done Created wheel for protego: filename=Protego-0.1.16-py3-none-any.whl size=7765 sha256=626b937f054376178cf6ae215e4b07133deb9ce861478a8a3a2682428cfb8a4e Stored in directory: /home/user/.cache/pip/wheels/91/64/36/bd0d11306cb22a78c7f53d603c7eb74ebb6c211703bc40b686 Successfully built protego Installing collected packages: protego, scrapy WARNING: The script scrapy is installed in '/home/user/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. Successfully installed protego-0.1.16 scrapy-2.1.0
The installation notes mentions the installation directory which in this case is /home/user/.local/bin
and if the directory is within the PATH
environment variable which is where your operating system will look for the program if you run the program without specifying the full path.
scrapy
using full path. $ /home/user/.local/bin/scrapy Scrapy 2.1.0 - no active project Usage: scrapy <command> [options] [args] Available commands: bench Run quick benchmark test fetch Fetch a URL using the Scrapy downloader genspider Generate new spider using pre-defined templates runspider Run a self-contained spider (without creating a project) settings Get settings values shell Interactive scraping console startproject Create new project version Print Scrapy version view Open URL in browser, as seen by Scrapy [ more ] More commands available when run from project directory Use "scrapy <command> -h" to see more info about a command
pip
installation directory to PATH
environment variable. $ echo PATH=$PATH:/home/user/.local/bin >> ~/.bashrc #Linux
scrapy
again without specifying full path. $ bash $ scrapy Scrapy 2.1.0 - no active project Usage: scrapy <command> [options] [args] Available commands: bench Run quick benchmark test fetch Fetch a URL using the Scrapy downloader genspider Generate new spider using pre-defined templates runspider Run a self-contained spider (without creating a project) settings Get settings values shell Interactive scraping console startproject Create new project version Print Scrapy version view Open URL in browser, as seen by Scrapy [ more ] More commands available when run from project directory Use "scrapy <command> -h" to see more info about a command
Comment anonymously. Login not required.