scrapy is a Python-based scraping and web crawling program and is generally available as a pip package. Some Linux distributions like Ubuntu however have Scrapy in its default package repository and can be installed via apt.

Ubuntu version of scrapy is more tightly integrated with the operating system in a way that it installs to the default application path and you don't need to install additional tools such as pip just to have scrapy installed.

The installed version is however normally tied to the Ubuntu version so you won't get the latest version of scrapy unless you also upgrade your Ubuntu version.

scrapy can be installed on Ubuntu using apt at the terminal.

Steps to install scrapy on Ubuntu:

  1. Launch terminal application.
  2. Update apt's package list from repository.
    $ sudo apt update
    [sudo] password for user:
    Hit:1 http://jp.archive.ubuntu.com/ubuntu eoan InRelease
    Hit:2 http://jp.archive.ubuntu.com/ubuntu eoan-updates InRelease
    Hit:3 http://jp.archive.ubuntu.com/ubuntu eoan-backports InRelease
    Hit:4 http://jp.archive.ubuntu.com/ubuntu eoan-security InRelease
    Reading package lists... Done
    Building dependency tree
    Reading state information... Done
    All packages are up to date.
  3. Install python3-scrapy package using apt.
    $ sudo apt install --assume-yes python3-scrapy
    Reading package lists... Done
    Building dependency tree
    Reading state information... Done
    The following additional packages will be installed:
      ipython3 libimagequant0 libjbig0 libjpeg-turbo8 libjpeg8 liblcms2-2
      libmysqlclient21 libtiff5 libwebp6 libwebpdemux2 libwebpmux3 mysql-common
      python3-boto python3-bs4 python3-cssselect python3-decorator
      python3-html5lib python3-ipython python3-ipython-genutils python3-libxml2
      python3-lxml python3-mysqldb python3-olefile python3-parsel python3-pexpect
      python3-pickleshare python3-pil python3-prompt-toolkit python3-ptyprocess
      python3-pydispatch python3-pygments python3-queuelib python3-simplegeneric
      python3-soupsieve python3-traitlets python3-w3lib python3-wcwidth
      python3-webencodings
    Suggested packages:
      liblcms2-utils python3-genshi python3-lxml-dbg python-lxml-doc
      default-mysql-server | virtual-mysql-server python-egenix-mxdatetime
      python3-mysqldb-dbg python-pexpect-doc python-pil-doc python3-pil-dbg
      python-pydispatch-doc python-pygments-doc ttf-bitstream-vera
      python-scrapy-doc
    ##### snipped
  4. Start using scrapy by running scrapy command at the terminal.
    $ scrapy
    Scrapy 1.7.3 - no active project
    
    Usage:
      scrapy <command> [options] [args]
    
    Available commands:
      bench         Run quick benchmark test
      fetch         Fetch a URL using the Scrapy downloader
      genspider     Generate new spider using pre-defined templates
      runspider     Run a self-contained spider (without creating a project)
      settings      Get settings values
      shell         Interactive scraping console
      startproject  Create new project
      version       Print Scrapy version
      view          Open URL in browser, as seen by Scrapy
    
      [ more ]      More commands available when run from project directory
    
    Use "scrapy <command> -h" to see more info about a command
Discuss the article:

Share your thoughts, suggest corrections or just say Hi. Login not required.

Share!