If you're using Python to do things like web scraping, there will be the time that you want to process a full URL and get just some of the specifics. It could include the protocol (http or https), host/domain name, subdomain, or the request path.
urllib is a Python module to process URLs. You can dissect and process a URL using urlparse function within the urllib module. It could split the URL into scheme (http or https), netloc(subdomain, domain, TLD), and path.
Steps to extract domain name from URL using Python:
- Launch your preferred Python shell.
$ ipython3 Python 3.8.2 (default, Apr 27 2020, 15:53:34) Type 'copyright', 'credits' or 'license' for more information IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.
- Import urllib.parse module.
In [1]: import urllib.parse
- Parse URL using urlparse function from urllib.parse module.
In [2]: parsed_url = urllib.parse.urlparse('https://www.example.com/page.html')
- Print out parsed URL output.
In [3]: print(parsed_url) ParseResult(scheme='https', netloc='www.example.com', path='/page.html', params=//, query=//, fragment=//)
- Select required output and process accordingly.
In [4]: print(parsed_url.netloc) www.example.com
- Create a Python script that accepts a URL as parameter and outputs corresponding parsed URL.
- get-host-name-from-url.py
#!/usr/bin/env python3 import urllib.parse import sys url = sys.argv[1] parsed_url = urllib.parse.urlparse(url) print(parsed_url) print("Host name: ", parsed_url.netloc)
- Run the script from the command with URL as parameter.
$ python3 get-host-name-from-url.py https://www.example.com/page.html ParseResult(scheme='https', netloc='www.example.com', path='/page.html', params=//, query=//, fragment=//) Host name: www.example.com
Author: Mohd
Shakir Zakaria
Mohd Shakir Zakaria is an experienced cloud architect with a strong development and open-source advocacy background. He boasts multiple certifications in AWS, Red Hat, VMware, ITIL, and Linux, underscoring his expertise in cloud architecture and system administration.
Mohd Shakir Zakaria is an experienced cloud architect with a strong development and open-source advocacy background. He boasts multiple certifications in AWS, Red Hat, VMware, ITIL, and Linux, underscoring his expertise in cloud architecture and system administration.
Discuss the article:
Comment anonymously. Login not required.