Definition
What this term means
An emerging web standard that allows website owners to declare their preferences for how AI systems may use their content. Similar to robots.txt but specifically designed for AI use cases, ai.txt communicates whether content may be used for AI training, summarisation, or citation, and under what conditions. The specification is still evolving, with growing adoption among publishers and AI companies.
Why it matters
The business impact
As AI systems increasingly scrape and use web content, ai.txt gives brands a mechanism to proactively communicate their preferences. Declaring your content as available for AI citation and summarisation (while potentially restricting training use) can encourage AI platforms to include your content in their retrieval systems. It is a forward-looking investment in controlling how your content participates in the AI ecosystem.
Used in context
How you might use this term
“A publisher implemented ai.txt to explicitly permit AI citation and summarisation of their articles while requesting attribution. They noticed an increase in AI citations with proper source links across multiple platforms that honour ai.txt declarations, gaining traffic they previously missed.”
Related terms
Explore connected concepts
Robots.txt
A plain text file placed at the root of a website that provides instructions to web crawlers about which pages and directories they are allowed or disallowed from accessing. Robots.txt is the primary mechanism for controlling how both traditional search engine crawlers and AI-specific crawlers (like GPTBot, Google-Extended, and ClaudeBot) interact with your website content.
AI Crawler
Automated bots operated by AI companies to discover, access, and index web content, either for model training, real-time retrieval, or both. Major AI crawlers include GPTBot (OpenAI), Google-Extended (Google), ClaudeBot (Anthropic), PerplexityBot (Perplexity), and CCBot (Common Crawl). Each crawler has different purposes and can be individually controlled through robots.txt directives.
Training Data
The massive datasets of text, code, and other content used to teach AI models during their initial training phase. Training data shapes the foundational knowledge of models like GPT, Gemini, and Claude, including what they know about brands, products, and industries. Sources include web crawls (such as Common Crawl), books, academic papers, Wikipedia, and publicly available databases.