If you're using Python to do things like web scraping, there will be the time that you want to process a full URL and get just some of the specifics. It could include the protocol (http or https), host/domain name, subdomain, or the request path.

urllib is a Python module to process URLs. You can dissect and process a URL using urlparse function within the urllib module. It could split the URL into scheme (http or https), netloc(subdomain, domain, TLD), and path.

Steps to extract domain name from URL using Python:

  1. Launch your preferred Python shell.
    $ ipython3
    Python 3.8.2 (default, Apr 27 2020, 15:53:34)
    Type 'copyright', 'credits' or 'license' for more information
    IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.
  2. Import urllib.parse module.
    In [1]: import urllib.parse
  3. Parse URL using urlparse function from urllib.parse module.
    In [2]: parsed_url = urllib.parse.urlparse('https://www.example.com/page.html')
  4. Print out parsed URL output.
    In [3]: print(parsed_url)
    ParseResult(scheme='https', netloc='www.example.com', path='/page.html', params=//, query=//, fragment=//)
  5. Select required output and process accordingly.
    In [4]: print(parsed_url.netloc)
    www.example.com
  6. Create a Python script that accepts a URL as parameter and outputs corresponding parsed URL.
    get-host-name-from-url.py
    #!/usr/bin/env python3
     
    import urllib.parse
    import sys
     
    url = sys.argv[1]
    parsed_url = urllib.parse.urlparse(url)
     
    print(parsed_url)
    print("Host name: ", parsed_url.netloc)
  7. Run the script from the command with URL as parameter.
    $ python3 get-host-name-from-url.py https://www.example.com/page.html
    ParseResult(scheme='https', netloc='www.example.com', path='/page.html', params=//, query=//, fragment=//)
    Host name:  www.example.com
Discuss the article:

Comment anonymously. Login not required.