A file that tells search engine crawlers which pages or sections of a site they should not access.
Robots.txt is a plain text file hosted at the root of a website (e.g. example.com/robots.txt) that uses the Robots Exclusion Protocol to instruct search engine crawlers on which parts of the site they should or shouldn't crawl.
Common legitimate uses include: blocking crawlers from admin pages, staging environments, internal search result pages, and URL parameters that produce duplicate content. Blocking these saves crawl budget and reduces index bloat.
A critical point: robots.txt blocks crawling, not indexing. If external sites link to a blocked URL, Google may still index it — it just won't be able to read the content. To prevent indexing, use a noindex meta tag instead (but the page must be crawlable for the noindex tag to be seen).
Errors in robots.txt can have severe consequences. Accidentally blocking Googlebot from crawling an entire site's CSS and JS, or blocking key sections of the site, can cause catastrophic ranking drops. Always test robots.txt changes carefully before deployment.
Want expert help applying this in B2B?
Indexed works with B2B companies on SEO strategy, content, and link building — built around how B2B buyers actually search today.