If you're using Python to do things like web scraping, there will be time that you want to process a full URL and get just some of the specifics. This could include things like the protocol (http or https), domain name, subdomain, or just the request path.
Python has urllib module to do all things URL. You can dissect and process a URL using urlparse function within the urllib module. It could split the URL to scheme (http or https), netloc(subdomain, domain and TLD) and path.
$ ipython3 Python 3.8.2 (default, Apr 27 2020, 15:53:34) Type 'copyright', 'credits' or 'license' for more information IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import urllib.parse
In [2]: parsed_url = urllib.parse.urlparse('https://www.example.com/page.html')
In [3]: print(parsed_url) ParseResult(scheme='https', netloc='www.example.com', path='/page.html', params=//, query=//, fragment=//)
In [4]: print(parsed_url.netloc) www.example.com
#!/usr/bin/env python3 import urllib.parse import sys url = sys.argv[1] parsed_url = urllib.parse.urlparse(url) print(parsed_url) print("Host name: ", parsed_url.netloc)
$ python3 get-host-name-from-url.py https://www.example.com/page.html ParseResult(scheme='https', netloc='www.example.com', path='/page.html', params=//, query=//, fragment=//) Host name: www.example.com
Comment anonymously. Login not required.