A robots.txt file tells compliant crawlers which URL paths on a website they may fetch. Create one when the site has low-value or repetitive areas such as cart flows, internal search results, preview URLs, faceted filters, or other sections that should not consume routine crawler requests.
The file must be served as UTF-8 plain text from the exact root URL for the origin that needs the rules, such as https://www.example.com/robots.txt. Current Google documentation supports User-agent, Allow, Disallow, and Sitemap directives, applies the rules only to the same protocol, host, and port, and treats path values as case-sensitive.
Robots.txt controls crawling, not secrecy or guaranteed removal from search results. A blocked URL can still appear as a bare result if it was discovered elsewhere, Google treats a missing /robots.txt file as no crawl restriction for that host, and blocking shared CSS, JavaScript, or image paths can make crawler rendering less accurate.
Good robots.txt candidates include internal search results, cart and checkout paths, faceted filter URLs, preview paths, and other low-value sections that should not absorb routine crawler requests.
Do not use robots.txt to protect private or staging content, because the file is public and blocked URLs can still be requested directly or appear as bare URLs in search results.
User-agent: * Disallow: /search/ Disallow: /cart/ Disallow: /checkout/ Sitemap: https://www.example.com/sitemap.xml
Keep the syntax simple: one directive per line, path values that start with /, and a fully qualified Sitemap: URL when the site publishes an XML sitemap.
User-agent: * Disallow: /private/ Allow: /private/help-center/
Allow: is useful only for a narrower exception inside a blocked parent path; if there is no exception, skip it.
Each important origin needs its own file when the crawl policy differs, so https://example.com/robots.txt does not control https://www.example.com/, http://example.com/, or https://example.com:8443/.
$ curl -i https://www.example.com/robots.txt HTTP/2 200 content-type: text/plain; charset=utf-8 User-agent: * Disallow: /search/ Disallow: /cart/ Disallow: /checkout/ Sitemap: https://www.example.com/sitemap.xml
A direct fetch exposes wrong filenames such as robots.txt.txt, uploads to the wrong document root, HTML error pages, and redirects that point at another host.
The report shows the fetched file, parsing issues, fetch history, and a recrawl action for urgent fixes after a broken fetch or a critical rule change.
Blocking a shared asset directory or publishing Disallow: / on the wrong origin can suppress discovery and weaken crawler rendering until the corrected file is fetched again.