Definition
What this term means
A plain text file placed at the root of a website that provides instructions to web crawlers about which pages and directories they are allowed or disallowed from accessing. Robots.txt is the primary mechanism for controlling how both traditional search engine crawlers and AI-specific crawlers (like GPTBot, Google-Extended, and ClaudeBot) interact with your website content.
Why it matters
The business impact
Your robots.txt configuration directly determines whether AI systems can access and index your content. Blocking AI crawlers, either intentionally or accidentally, means your content cannot be discovered, retrieved, or cited by AI platforms. Conversely, strategically allowing access while blocking staging, duplicate, or low-value pages ensures AI systems focus on your best content.
Used in context
How you might use this term
“An audit revealed that a company's robots.txt was inadvertently blocking GPTBot and ClaudeBot due to an overly broad disallow rule. After updating the configuration to explicitly allow major AI crawlers while keeping staging environments blocked, their AI visibility began improving within weeks as content was re-indexed.”