Context poisoning in LLMs: How to defend your RAG system

Agent Builder is available now GA. Get started with an Elastic Cloud Trial, and check out the documentation for Agent Builder here.

With the recent release of models like Claude Sonnet 4.5, Gemini 3 series and GPT 5 featuring million-token context windows, there’s a growing misconception that context management is becoming simpler. If a large language model (LLM) can process millions of tokens at once, does it really matter what information we provide?

The reality is precisely the opposite. Context engineering, the practice of managing what information reaches your LLM, is more critical than ever. Large context windows don’t eliminate the need for precision; they amplify it. With more context comes exponentially more opportunities for error, hallucinations, and irrelevant information to contaminate your LLM reasoning process.

Whether you’re using retrieval-augmented generation (RAG) retrieval, tool outputs, or memory systems, effective context engineering isn’t about providing more information but about providing the right information. That’s where Elasticsearch comes in, serving as your context engineering platform.

In this article, we’ll explore what context poisoning is, how it manifests across different types of memory, and how Elasticsearch RAG capabilities provide defense at every stage of the retrieval pipeline, from ingestion to composition, ensuring your LLM receives clean, relevant, and reliable context.

What is context poisoning?

Context poisoning occurs when compromised, outdated, or irrelevant information enters an LLM’s context window, leading to degraded responses, hallucinations, or perpetuated errors. Once corrupted or incorrect information enters the context window, it propagates into answers. The LLM references it as truth, creating cascading errors across the conversation.

This poisoning can happen at multiple stages of the LLM lifecycle (like in training), but our focus is on the retrieval and composition stages. Although adversarial attacks, like prompt injection, also pose risks, this article focuses on the operational patterns that teams encounter most frequently in production environments.

Operational understanding

Context poisoning often happens for reasons like:

Context rot: Information becomes outdated but remains in your knowledge base without being updated or deleted.
Context overflow: Too much information overwhelms the LLM's attention to the real important and relevant context, leading to missing relevant information from answers.
Conflicting information: Multiple sources provide contradictory data, confusing the model.
Semantic noise: Vectorial similar but contextually irrelevant content dilutes relevance.
Malicious injection: Content deliberately inserted by attackers into knowledge bases, including prompt injections or manipulated data.

Understanding these patterns is the first step toward building robust defenses. Let’s examine each pattern and how Elasticsearch helps you address them. You can follow along with the supporting notebook.

Types of context poisoning

Temporal degradation

Over time, information in your knowledge base becomes outdated, and without proper management, stale content continues to be retrieved and presented to your LLM as current truth. This is especially problematic in industries where information changes frequently, like product documentation, pricing, regulations, or news.

Impact

Your LLM provides outdated advice, references to deprecated features, or contradictions to current reality, disengaging user trust.

Solutions: Temporal filtering in hybrid search

Elasticsearch’s date-based query capabilities ensure your RAG system prioritizes recent and relevant information through explicit temporal filters.

Example: Product documentation search with time filtering

A user asks your chatbot about authentication setup. Six months ago, the authentication had a significant change, so it’s important to only return documents from six months or earlier.

Without temporal filtering

Response without filtering: Contradictory results

The LLM receives three different methods for OAuth configuration: the current security API (9.x), legacy realm settings (7.x), and the deprecated shield plugin (6.x). This contradictory context leads to confused or misleading responses:

With temporal filtering

Add a filter to restrict results to documents updated within the last six months:

This hybrid search query

Semantic search (semantic) captures related concepts and context using the content_semantic field.
Lexical search (multi_match) matches exact keywords like “OAuth” with field boosting title^2.
Reciprocal rank fusion (RRF) combines both results sets with balanced reranking, retrieving the most relevant results.
Temporal filter ensures only documents updated within the last six months are retrieved.
Status filter restricts results to published documents, excluding drafts or deprecated content.

Response with temporal filtering: Consistent results

The temporal filtering eliminated outdated documents, leaving only current documentation for version 9.x. The LLM now receives consistent context and generates confident, accurate responses:

Relative versus absolute time filters

Relative filtering (recommended for most use cases):

Absolute filtering (for specific time ranges):

Impact on LLM response quality

Without filtering: LLM receives contradictory guidance from 2023–2025, producing uncertain responses mixing deprecated and current methods.
With temporal filtering: LLM receives only recent documentation, generating confident responses based on current best practices.

Information conflicts

When your RAG system retrieves documentation for features that behave differently across deployment types, versions, or configurations, conflicting information can confuse the LLM about which guidance applies to the user’s specific context.

Impact

The LLM has to use more resources and tokens to understand and determine which information is correct, becoming more prone to errors and hallucinations.

Solutions: Hybrid search with metadata boosting

Elasticsearch’s bool query with a should clause allows you to boost values to prioritize documents matching specific metadata, ensuring deployment-specific or version-specific documentation appears first in the context window. For query syntax details, refer to Bool query reference.

Example: Deployment-specific feature documentation

A user asks, “How do I configure custom users in serverless?” Your knowledge base contains information about cloud, self-hosted, and managed deployments. With proper metadata prioritization, the LLM retrieves signals about feature availability and provides correct guidance:

What this query does

must clause: All documents must match “How do I configure custom users in serverless?”
should clauses with explicit boosting:
- Documents with deployment_type: “serverless” receive 3x boost.
- Documents with doc_status: “current” receive 2x boost.
Semantic search runs in parallel to capture conceptual matches.
RRF combines lexical (with metadata boosting) and semantic results to get the best of both approaches.

Expected response:

How metadata boosting resolves conflicts

metadata boosting resolves conflicts context poisoning

Impact on LLM response quality

Without metadata boosting: The context window receives equal-weight documents from all deployment types. The LLM produces vague responses that hedge between possibilities, failing to clearly state deployment-specific limitations.
With metadata boosting (3x): Managed-specific documentation dominates the top results. The LLM generates direct answers about feature unavailability and provides actionable alternatives while maintaining the cross-deployment context for follow-up questions.

Semantic noise

Vector similarity search can retrieve documents that are semantically related but contextually irrelevant to the user’s need. This “semantic drift” occurs when embeddings capture a similarity without understanding the query intent. So when your context window fills with irrelevant information, the LLM's ability to generate precise answers declines.

Impact

The LLM receives correct information that doesn’t answer the question, wasting the context window and lowering the quality of the provided answer.

Solution: Hybrid search

Elasticsearch hybrid search combines lexical precision with semantic understanding, using explicit product filters to eliminate cross-product drift while maintaining conceptual recall.

Example: Technical documentation search

A developer searches for “Elastic Agent configuration,” and your knowledge base contains both the Elastic Agent (Elastic Observability) and the Elastic Agent Builder documentation. Both use the word "agent" prominently, making them semantically similar.

Let’s search for agent configuration documentation:

This hybrid query:

Lexical component (multi_match) ensures exact keyword matches for "agent", "configuration", "logs", "metrics", and “collection”.
Field boosting (title^3, tags^2) prioritizes documents where terms appear in important fields.
Semantic component captures conceptual relationships and the intent about “configuring data collection agents”.
RRF merges both result sets with balanced ranking using rank_constant: 20.
Product filter restricts results to Elastic Observability and Elastic Agent domains, eliminating Agent Builder docs entirely.
Category filter restricts results to "observability" and "elastic-agent" domains, eliminating semantic drift to other domains.

Expected response:

Why hybrid search works

Search type	Strengths
Lexical only	Precise keyword matching
Semantic only	Captures semantic meaning
Hybrid search	Precision and recall, intent understanding

Before and after: LLM response comparison

Elasticsearch RAG best practices

Following these best practices optimizes your context engineering and significantly reduces the risk of context poisoning in your RAG systems. By implementing the following strategies, you ensure that every token in your context window contributes to relevant, accurate, and trustworthy LLM responses.

Choose the right search strategy for your data:
Select your search approach based on your data characteristics and query patterns. Choose between lexical, semantic, or hybrid search. For more details, refer to Search approaches | Elastic Docs.
Implement temporal awareness
Time-sensitive information requires active management to prevent outdated content from contaminating your context window. Use range queries with relative time filters (like now-6M or now-1y) for content that changes frequently, ensuring your RAG system prioritizes recent content. For more details, refer to Range query | Reference.
Use metadata boosting
When your knowledge base contains similar content across different contexts, such as multiple product versions, deployment types, or user roles, metadata boosting helps prioritize contextually relevant results. For more details, refer to Boolean query | Reference.
Apply reranking when needed
For complex or high-priority queries where precision is critical, consider implementing a reranker solution where the model can significantly improve the search result quality by reordering results based on the semantic understanding of the queries and documents. For more details, refer to Ranking and reranking | Elastic Docs.
Optimize chunking strategies
Chunking is the process of breaking down large text into smaller “chunks.” Document chunking strategy affects both semantic representation and retrieval precision. Smaller chunks provide more granularity, but you may lose context; and larger chunks preserve more context, but you reduce retrieval precision. For more details, refer to Understanding chunking strategies in Elasticsearch.
Filter the data before it reaches the LLM
Vector similarity search can retrieve semantically related but contextually irrelevant documents. Apply explicit filters on product, category, or domain fields to constrain results to the appropriate context before delivering the results to the LLM. For more details, refer to RAG pipelines in production: Operationalize your GenAI project - Elasticsearch Labs.
Calibrate your retrieval volume (k)
Finding the "Goldilocks zone" for the number of documents retrieved is essential. Too few results lead to incomplete answers, while too many can cause the LLM to miss key facts. Balance your token budget against the depth of the model's window. For more details, refer to kNN search in Elasticsearch | Elastic Docs.
Consider summarization for large documents
When retrieved content exceeds your context budget, summarization techniques help retain essential information while reducing token count. For more details, refer to Adding AI summaries to your site with Elastic - Elasticsearch Labs.
Monitor and iterate
Over time, as knowledge bases grow and content evolves, we recommend that you implement monitoring to track relevance score distributions, temporal patterns in retrieved results, and user feedback signals. Watch for signs like outdated documents, declining user satisfaction scores, or a growing number of “no relevant results” queries. For more details, refer to Elastic Observability: Streams Data Quality and Failure Store Insights.

Conclusion

The new era of million-token context windows has not made context management obsolete; it has made context engineering more critical than ever. As context windows grow, so does the potential for poisoning from any source retrieval, tools, or memory.

The patterns shown in this article apply beyond just RAG. Temporal filtering, metadata boosting, and hybrid search are foundational techniques that improve context quality, regardless of source.

By implementing these strategies, you maintain control over what information reaches your LLM, ensuring relevance, accuracy, and trust at scale.

Report an issue

How to defend your RAG system from context poisoning

What is context poisoning?

Operational understanding

Types of context poisoning

Temporal degradation

Impact

Solutions: Temporal filtering in hybrid search

Without temporal filtering

Response without filtering: Contradictory results

With temporal filtering

This hybrid search query

Response with temporal filtering: Consistent results

Relative versus absolute time filters

Impact on LLM response quality

Information conflicts

Impact

Solutions: Hybrid search with metadata boosting

Example: Deployment-specific feature documentation

What this query does

Expected response:

How metadata boosting resolves conflicts

Impact on LLM response quality

Semantic noise

Impact

Solution: Hybrid search

Example: Technical documentation search

Why hybrid search works

Elasticsearch RAG best practices

Conclusion

Related Content

Elasticsearch vector search is up to 8x faster than OpenSearch

Common Expression Language (CEL): How the CEL input improves data collection in Elastic Agent integrations

Build a "chat with your website data" agent with Jina Embeddings v5 and Elasticsearch

Elasticsearch 9.3 adds bfloat16 vector support

AI agents that perform actions: Automating IT requests with Agent Builder and Workflows

Ready to build state of the art search experiences?