If you're using Python to do things like web scraping, there will be the time that you want to process a full URL and get just some of the specifics. It could include the protocol (http or https), host/domain name, subdomain, or the request path.
urllib is a Python module to process URLs. You can dissect and process a URL using urlparse function within the urllib module. It could split the URL into scheme (http or https), netloc(subdomain, domain, TLD), and path.
$ ipython3 Python 3.8.2 (default, Apr 27 2020, 15:53:34) Type 'copyright', 'credits' or 'license' for more information IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import urllib.parse
In [2]: parsed_url = urllib.parse.urlparse('https://www.example.com/page.html')
In [3]: print(parsed_url) ParseResult(scheme='https', netloc='www.example.com', path='/page.html', params=//, query=//, fragment=//)
In [4]: print(parsed_url.netloc) www.example.com
#!/usr/bin/env python3 import urllib.parse import sys url = sys.argv[1] parsed_url = urllib.parse.urlparse(url) print(parsed_url) print("Host name: ", parsed_url.netloc)
$ python3 get-host-name-from-url.py https://www.example.com/page.html ParseResult(scheme='https', netloc='www.example.com', path='/page.html', params=//, query=//, fragment=//) Host name: www.example.com
Comment anonymously. Login not required.