Retrieval-Augmented Generation (RAG) Prompting
Retrieval-Augmented Generation (RAG) Prompting
A limitation of LLMs: they work from training data with a knowledge cutoff. They can’t answer questions about private documents, recent events, or specialized domains without hallucinating. Retrieval-Augmented Generation (RAG) solves this by combining document retrieval with prompt engineering.
This lesson teaches you to build RAG systems that let LLMs answer questions based on retrieved context while maintaining accuracy and attribution.
RAG Architecture Overview
RAG follows a simple pattern:
User Query → Retrieve → Augment → Generate → Response
(Find relevant (Build prompt (Answer
documents) with context) with sources)
A practical example:
User: "What are the terms of service changes in Q4?"
1. Retrieve: Search documents, find latest TOS
2. Augment: Build prompt: "Based on this TOS document: [content], answer: What are Q4 changes?"
3. Generate: LLM answers using the retrieved document
4. Response: "Based on section 3.2, the main changes are..."
Crafting Prompts for RAG Effectiveness
The way you format retrieved context in prompts dramatically affects quality:
# Bad RAG prompt: Too much context, unclear expectations
POOR_RAG_PROMPT = """You are helpful. Here is information:
{context}
Now answer this: {question}"""
# Good RAG prompt: Clear roles, structure, explicit instructions
GOOD_RAG_PROMPT = """You are a helpful assistant answering questions about company policies.
You have access to the following reference documents:
<documents>
{context}
</documents>
Instructions:
1. Answer the question based ONLY on the provided documents
2. If the answer is not in the documents, say "I cannot find this information in the provided documents"
3. Always cite which document section supports your answer
4. Be concise but complete
Question: {question}
Answer:"""
# Even better: Different prompts for different scenarios
EXTRACT_ANSWER_PROMPT = """Based on these documents:
<documents>
{context}
</documents>
Extract a direct answer to: {question}
Format your response as:
ANSWER: [direct answer]
SOURCE: [document section that supports this]
CONFIDENCE: [low/medium/high]"""
SYNTHESIS_PROMPT = """Synthesize information from these sources to answer: {question}
<documents>
{context}
</documents>
Requirements:
- Combine information across multiple sources if relevant
- Clearly distinguish between directly stated information and logical inferences
- Flag any contradictions between sources
- Provide supporting citations"""
Handling Irrelevant, Contradictory, or Missing Context
Real RAG systems face messy situations:
from typing import Optional
class RobustRAGPromptBuilder:
"""Build RAG prompts that handle edge cases."""
@staticmethod
def build_prompt(question: str,
retrieved_docs: list,
system_context: str = "") -> tuple:
"""
Build a RAG prompt and note its quality.
Returns:
(prompt, quality_assessment)
"""
quality = {
"has_context": len(retrieved_docs) > 0,
"context_relevance": "unknown",
"context_conflicts": False,
"warnings": []
}
# Case 1: No documents retrieved
if not retrieved_docs:
quality["warnings"].append("No context retrieved")
return (
f"The knowledge base has no information about: {question}\n\n"
f"Based on your general knowledge, how would you answer this?\n"
f"State any assumptions you're making.",
quality
)
# Case 2: Single document
if len(retrieved_docs) == 1:
quality["context_relevance"] = "single_source"
context = retrieved_docs[0]["content"]
prompt = f"""Answer this question using ONLY the provided document.
Document: {retrieved_docs[0]['source']}
{context}
Question: {question}
If the document doesn't contain the answer, say so explicitly."""
# Case 3: Multiple documents - check for conflicts
else:
context_snippets = []
sources_list = []
for doc in retrieved_docs:
context_snippets.append(doc["content"])
sources_list.append(doc["source"])
# Simple conflict detection: look for contradictory keywords
if "not" in doc["content"].lower() and \
any("yes" in d.get("content", "").lower()
for d in retrieved_docs if d != doc):
quality["context_conflicts"] = True
quality["warnings"].append("Potential conflicts in sources")
context = "\n\n---\n\n".join(context_snippets)
sources = ", ".join(sources_list)
prompt = f"""Answer the question below using the provided sources.
Sources: {sources}
{context}
Question: {question}
Instructions:
1. Use information from the sources above
2. If sources conflict, note the difference and explain
3. Cite the source for each claim
4. If information is not in the sources, clearly state this
Answer:"""
quality["context_relevance"] = f"multiple_sources_({len(retrieved_docs)})"
return prompt, quality
# Usage
docs = [
{
"source": "Company Policy v2.1",
"content": "Remote work is allowed 2 days per week"
},
{
"source": "Recent FAQ Update",
"content": "Due to Q4 initiatives, all staff must be in-office Mon-Wed"
}
]
prompt, quality = RobustRAGPromptBuilder.build_prompt(
"How many days per week can I work remotely?",
docs
)
print(prompt)
print("Quality assessment:", quality)
# Output includes warning about conflicting sources
Citation and Source Attribution
Users trust answers more when sources are cited. Enforce citations in prompts:
class CitationEnforcer:
"""
Ensure responses include proper citations.
"""
@staticmethod
def build_citation_prompt(question: str, documents: list) -> str:
"""
Build prompt that enforces citations in response.
"""
doc_list = ""
for i, doc in enumerate(documents, 1):
doc_list += f"\n[DOC{i}] {doc['source']}: {doc['content'][:300]}..."
return f"""Answer the following question based on the provided documents.
Documents:
{doc_list}
Question: {question}
CRITICAL INSTRUCTIONS:
1. Every factual claim must be supported by a citation
2. Use format: "[DOC1: Quote that supports this claim]"
3. If information is not in documents, write "[NOT IN DOCUMENTS]"
4. Never make up sources
Example format:
"The policy was updated in Q4 [DOC2: 'Remote work policy updated October 2024']"
Now answer:"""
@staticmethod
def verify_citations(response: str, documents: list) -> dict:
"""
Check if citations in response actually exist in documents.
"""
import re
citation_pattern = r'\[DOC(\d+):'
found_citations = re.findall(citation_pattern, response)
valid_citations = []
missing_citations = []
for doc_num_str in found_citations:
doc_num = int(doc_num_str)
if 1 <= doc_num <= len(documents):
valid_citations.append(doc_num)
else:
missing_citations.append(doc_num)
return {
"total_citations": len(found_citations),
"valid_citations": len(valid_citations),
"invalid_citations": missing_citations,
"has_unsupported_claims": len(missing_citations) > 0,
"citation_coverage": len(valid_citations) / max(len(found_citations), 1)
}
# Usage
docs = [
{"source": "Policy A", "content": "Remote work allowed 2 days/week"},
{"source": "Policy B", "content": "On-site required Mon-Wed for Q4"}
]
prompt = CitationEnforcer.build_citation_prompt(
"What's the remote work policy?",
docs
)
# Simulate LLM response
response = """The remote work policy has two versions:
1. Standard policy allows 2 days per week remote [DOC1]
2. Q4 requires on-site Mon-Wed [DOC2]
This creates a temporary change [DOC3]""" # DOC3 doesn't exist
citation_check = CitationEnforcer.verify_citations(response, docs)
print("Citation verification:", citation_check)
# Output: has_unsupported_claims = True (DOC3 invalid)
Advanced RAG Patterns
Query Rewriting
Sometimes user queries aren’t optimal for retrieval:
class QueryRewriter:
"""Rewrite queries to improve retrieval."""
@staticmethod
def build_rewrite_prompt(original_query: str) -> str:
"""Generate prompt to rewrite query."""
return f"""You are a query optimization expert. Rewrite this question to be clearer for a search engine while preserving the intent.
Original: {original_query}
Rewritten versions (3-5 alternatives):
1. [Alternative 1]
2. [Alternative 2]
...
Format each as a complete question."""
@staticmethod
def build_expansion_prompt(original_query: str) -> str:
"""Generate prompt to expand query with related terms."""
return f"""Expand this query with related terms that might retrieve better documents.
Original: {original_query}
Related queries:
1. [Expansion 1]
2. [Expansion 2]
3. [Expansion 3]
Format each as a complete question."""
Hypothetical Document Embeddings (HyDE)
Generate hypothetical documents to improve retrieval:
class HyDERetriever:
"""
Use LLM to generate hypothetical documents
matching the query, then use embeddings to find similar real docs.
"""
def __init__(self, llm_client, embedding_client, document_store):
self.llm = llm_client
self.embeddings = embedding_client
self.documents = document_store
def generate_hypothetical_docs(self, query: str, num_docs: int = 3) -> list:
"""Have LLM write hypothetical documents answering the query."""
prompt = f"""Write {num_docs} realistic documents that would answer this question:
{query}
Each document should be a paragraph (2-3 sentences) that directly answers the question.
Format each as: "DOCUMENT {n}: [content]"
Don't reference being hypothetical."""
response = self.llm.complete(prompt)
# Parse response to extract generated documents
import re
docs = re.findall(r'DOCUMENT \d+: (.+?)(?=DOCUMENT|\Z)', response, re.DOTALL)
return docs
def retrieve_with_hyde(self, query: str) -> list:
"""
Retrieve by:
1. Generate hypothetical documents
2. Embed them
3. Find real documents similar to hypothetical ones
"""
hypothetical = self.generate_hypothetical_docs(query)
# Embed hypothetical documents
hyp_embeddings = [
self.embeddings.embed(doc) for doc in hypothetical
]
# Average the embeddings
import numpy as np
query_embedding = np.mean(hyp_embeddings, axis=0)
# Find most similar real documents
similar_docs = self.documents.find_similar(
query_embedding,
top_k=5
)
return similar_docs
Building a Complete RAG System
Assemble all pieces into a production RAG system:
from typing import List, Dict
import openai
class RAGSystem:
"""
Complete RAG system for question answering
over a document collection.
"""
def __init__(self, document_store, embedding_model: str = "text-embedding-3-small"):
self.documents = document_store
self.embedding_model = embedding_model
self.citation_enforcer = CitationEnforcer()
def retrieve_documents(self, query: str, top_k: int = 3) -> List[Dict]:
"""
Retrieve relevant documents for a query.
"""
# In real implementation, use embeddings
# For now, return documents matching keywords
query_words = set(query.lower().split())
scored_docs = []
for doc in self.documents:
doc_words = set(doc["content"].lower().split())
overlap = len(query_words & doc_words)
if overlap > 0:
scored_docs.append((doc, overlap))
scored_docs.sort(key=lambda x: x[1], reverse=True)
return [doc for doc, score in scored_docs[:top_k]]
def answer_question(self, question: str) -> Dict:
"""
Answer a question using RAG approach.
"""
# Retrieve documents
docs = self.retrieve_documents(question)
# Build prompt with citations
prompt = self.citation_enforcer.build_citation_prompt(question, docs)
# Call LLM
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3 # Lower temperature for factuality
)
answer = response.choices[0].message.content
# Verify citations
citation_check = self.citation_enforcer.verify_citations(answer, docs)
# Assess response quality
quality_score = "good" if citation_check["has_unsupported_claims"] == False else "needs_review"
return {
"question": question,
"answer": answer,
"sources": [doc["source"] for doc in docs],
"citation_analysis": citation_check,
"quality": quality_score,
"retrieved_docs": len(docs)
}
# Usage
sample_docs = [
{
"source": "Employee Handbook 2024",
"content": "Remote work is allowed 2 days per week under the standard policy."
},
{
"source": "Q4 Special Notice",
"content": "Due to project requirements, all staff must be in-office Mon-Wed through December."
},
{
"source": "Compensation Policy",
"content": "Remote work allowance is $50/month for approved remote days."
}
]
rag = RAGSystem(sample_docs)
result = rag.answer_question("What is the remote work policy and allowance?")
print(f"Question: {result['question']}")
print(f"Answer: {result['answer']}")
print(f"Sources: {result['sources']}")
print(f"Quality: {result['quality']}")
Key Takeaway: RAG effectiveness depends on crafting prompts that clearly use retrieved context while handling edge cases like missing or conflicting information. Always enforce citations to maintain user trust.
Exercise: Build a RAG Prompt System for Knowledge Base Q&A
Create a RAG system that answers questions about a knowledge base:
- Retrieve relevant documents by keyword and embedding similarity
- Build prompts that handle 3 cases: no context, single source, multiple sources
- Enforce citations in responses
- Verify that citations actually exist
- Generate quality scores for responses
Requirements:
- Support 3+ different RAG prompt templates
- Handle conflicting information gracefully
- Verify all citations match documents
- Return structured output with quality metrics
- Support query rewriting for better retrieval
Starter code:
class KnowledgeBaseRAG:
"""RAG system for company knowledge base."""
def __init__(self, kb_documents: list):
self.documents = kb_documents
self.builder = RobustRAGPromptBuilder()
self.enforcer = CitationEnforcer()
def answer(self, question: str) -> dict:
"""
Answer question with RAG approach.
Returns:
Dict with answer, sources, quality metrics
"""
# TODO: Retrieve relevant documents
# TODO: Build RAG prompt (handle edge cases)
# TODO: Call LLM
# TODO: Verify citations
# TODO: Return structured response with quality
pass
# Load knowledge base
kb = [
{"source": "FAQ", "content": "..."},
{"source": "Policies", "content": "..."},
# ... more documents
]
rag = KnowledgeBaseRAG(kb)
result = rag.answer("How do I request time off?")
Extension challenges:
- Implement HyDE for better retrieval
- Add query rewriting before retrieval
- Support multi-hop questions (question splits into sub-questions)
- Track citation accuracy over time
- Build a feedback loop to improve document indexing
By completing this exercise, you’ll build systems that combine the power of LLMs with the reliability of knowledge bases.