Definition
What this term means
A plain text file placed at the root of a website that provides instructions to web crawlers about which pages and directories they are allowed or disallowed from accessing. Robots.txt is the primary mechanism for controlling how both traditional search engine crawlers and AI-specific crawlers (like GPTBot, Google-Extended, and ClaudeBot) interact with your website content.
Why it matters
The business impact
Your robots.txt configuration directly determines whether AI systems can access and index your content. Blocking AI crawlers, either intentionally or accidentally, means your content cannot be discovered, retrieved, or cited by AI platforms. Conversely, strategically allowing access while blocking staging, duplicate, or low-value pages ensures AI systems focus on your best content.
Used in context
How you might use this term
“An audit revealed that a company's robots.txt was inadvertently blocking GPTBot and ClaudeBot due to an overly broad disallow rule. After updating the configuration to explicitly allow major AI crawlers while keeping staging environments blocked, their AI visibility began improving within weeks as content was re-indexed.”
Related terms
Explore connected concepts
Crawl Budget
The total number of pages that search engine and AI crawlers will fetch from your website within a given time period. Crawl budget is determined by a combination of your site's perceived authority, server performance, URL structure, and content freshness signals. Crawlers allocate their budget based on these factors, spending more time on sites they consider valuable and efficient to crawl.
ai.txt
An emerging web standard that allows website owners to declare their preferences for how AI systems may use their content. Similar to robots.txt but specifically designed for AI use cases, ai.txt communicates whether content may be used for AI training, summarisation, or citation, and under what conditions. The specification is still evolving, with growing adoption among publishers and AI companies.
AI Crawler
Automated bots operated by AI companies to discover, access, and index web content, either for model training, real-time retrieval, or both. Major AI crawlers include GPTBot (OpenAI), Google-Extended (Google), ClaudeBot (Anthropic), PerplexityBot (Perplexity), and CCBot (Common Crawl). Each crawler has different purposes and can be individually controlled through robots.txt directives.