Definition
What this term means
The process by which a trained AI model generates an output, such as a text response, recommendation, or summary, based on input it receives. Every time you ask ChatGPT a question or Perplexity runs a search, the model is performing inference: processing your input through billions of parameters to produce a relevant response.
Why it matters
The business impact
Understanding inference helps brands optimise for how AI systems actually process and respond to queries. During inference, the model draws on its training data, any retrieved context (via RAG), and the specific wording of the user's prompt. Content that is structured to align with common inference patterns, including clear claims, supporting evidence, and entity-rich language, is more likely to be included in the output.
Used in context
How you might use this term
“A brand ran inference tests across five major AI models using 100 category-relevant prompts. The results revealed that their brand was consistently cited when prompts included specific technical terminology, but absent for broader queries, informing a content strategy that addressed both.”
Related terms
Explore connected concepts
LLM
A type of artificial intelligence model trained on vast datasets of text to understand, generate, and reason about human language. LLMs power the AI assistants and generative search tools, including ChatGPT, Google Gemini, Claude, and Perplexity, that are rapidly becoming the primary way people discover products, services, and information online.
Prompt Engineering
The discipline of crafting effective inputs (prompts) to AI systems to elicit desired, accurate, and useful outputs. Prompt engineering involves understanding how AI models interpret instructions, what context improves response quality, and how to structure requests for optimal results. It applies to both end users writing queries and developers building AI-powered applications.
Context Window
The maximum amount of text, measured in tokens, that an AI model can process in a single interaction. The context window determines how much information the model can consider when generating a response. Modern models like GPT-4o and Claude support context windows of 128,000+ tokens, but RAG-retrieved snippets are typically much shorter, making concise content crucial for citation.