Definition
What this term means
The total number of pages that search engine and AI crawlers will fetch from your website within a given time period. Crawl budget is determined by a combination of your site's perceived authority, server performance, URL structure, and content freshness signals. Crawlers allocate their budget based on these factors, spending more time on sites they consider valuable and efficient to crawl.
Why it matters
The business impact
When crawl budget is wasted on duplicate pages, redirect chains, or low-value URLs, your most important content may not be crawled frequently enough to be accurately indexed by AI systems. Optimising crawl budget, by removing duplicates, fixing broken links, and prioritising key pages, ensures that AI crawlers spend their time on the content that matters most for your visibility.
Used in context
How you might use this term
“A large e-commerce site discovered that 40% of their crawl budget was being consumed by faceted navigation pages with duplicate content. After implementing proper canonical tags and noindex directives on filter pages, crawl efficiency improved dramatically, with AI crawlers visiting priority product pages 3x more frequently.”
Related terms
Explore connected concepts
Robots.txt
A plain text file placed at the root of a website that provides instructions to web crawlers about which pages and directories they are allowed or disallowed from accessing. Robots.txt is the primary mechanism for controlling how both traditional search engine crawlers and AI-specific crawlers (like GPTBot, Google-Extended, and ClaudeBot) interact with your website content.
Sitemap
An XML file that provides search engines and AI crawlers with a structured list of all important URLs on a website, along with metadata about each page, including when it was last modified, how frequently it changes, and its relative priority. Sitemaps serve as a roadmap that helps crawlers discover, prioritise, and efficiently index your content.
AI Crawler
Automated bots operated by AI companies to discover, access, and index web content, either for model training, real-time retrieval, or both. Major AI crawlers include GPTBot (OpenAI), Google-Extended (Google), ClaudeBot (Anthropic), PerplexityBot (Perplexity), and CCBot (Common Crawl). Each crawler has different purposes and can be individually controlled through robots.txt directives.