If you're using Python to do things like web scraping, there will be time that you want to process a full URL and get just some of the specifics. This could include things like the protocol (http or https), domain name, subdomain, or just the request path.

Python has urllib module to do all things URL. You can dissect and process a URL using urlparse function within the urllib module. It could split the URL to scheme (http or https), netloc(subdomain, domain and TLD) and path.

Steps to get host name from URL using Python:

  1. Launch your preferred Python shell.
    $ ipython3
    Python 3.8.2 (default, Apr 27 2020, 15:53:34)
    Type 'copyright', 'credits' or 'license' for more information
    IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.
  2. Import urllib.parse module.
    In [1]: import urllib.parse
  3. Parse URL using urlparse function from urllib.parse module.
    In [2]: parsed_url = urllib.parse.urlparse('https://www.example.com/page.html')
  4. Print out parsed URL output.
    In [3]: print(parsed_url)
    ParseResult(scheme='https', netloc='www.example.com', path='/page.html', params='', query='', fragment='')
  5. Select required output and process accordingly.
    In [4]: print(parsed_url.netloc)
    www.example.com
  6. Create a Python script that accepts a URL as parameter and outputs corresponding parsed URL.
    get-host-name-from-url.py
    #!/usr/bin/env python3
     
    import urllib.parse
    import sys
     
    url = sys.argv[1]
    parsed_url = urllib.parse.urlparse(url)
     
    print(parsed_url)
    print("Host name: ", parsed_url.netloc)
  7. Run the script from the command with URL as parameter.
    $ python3 get-host-name-from-url.py https://www.example.com/page.html
    ParseResult(scheme='https', netloc='www.example.com', path='/page.html', params='', query='', fragment='')
    Host name:  www.example.com
Discuss the article:

Comment anonymously. Login not required.

Share!