Retrieval-Augmented Generation (RAG) Prompting

A limitation of LLMs: they work from training data with a knowledge cutoff. They can’t answer questions about private documents, recent events, or specialized domains without hallucinating. Retrieval-Augmented Generation (RAG) solves this by combining document retrieval with prompt engineering.

This lesson teaches you to build RAG systems that let LLMs answer questions based on retrieved context while maintaining accuracy and attribution.

RAG Architecture Overview

RAG follows a simple pattern:

User Query → Retrieve → Augment → Generate → Response
            (Find relevant   (Build prompt    (Answer
             documents)      with context)    with sources)

A practical example:

User: "What are the terms of service changes in Q4?"

1. Retrieve: Search documents, find latest TOS
2. Augment: Build prompt: "Based on this TOS document: [content], answer: What are Q4 changes?"
3. Generate: LLM answers using the retrieved document
4. Response: "Based on section 3.2, the main changes are..."

Crafting Prompts for RAG Effectiveness

The way you format retrieved context in prompts dramatically affects quality:

# Bad RAG prompt: Too much context, unclear expectations
POOR_RAG_PROMPT = """You are helpful. Here is information:
{context}

Now answer this: {question}"""

# Good RAG prompt: Clear roles, structure, explicit instructions
GOOD_RAG_PROMPT = """You are a helpful assistant answering questions about company policies.

You have access to the following reference documents:
<documents>
{context}
</documents>

Instructions:
1. Answer the question based ONLY on the provided documents
2. If the answer is not in the documents, say "I cannot find this information in the provided documents"
3. Always cite which document section supports your answer
4. Be concise but complete

Question: {question}

Answer:"""

# Even better: Different prompts for different scenarios
EXTRACT_ANSWER_PROMPT = """Based on these documents:
<documents>
{context}
</documents>

Extract a direct answer to: {question}

Format your response as:
ANSWER: [direct answer]
SOURCE: [document section that supports this]
CONFIDENCE: [low/medium/high]"""

SYNTHESIS_PROMPT = """Synthesize information from these sources to answer: {question}

<documents>
{context}
</documents>

Requirements:
- Combine information across multiple sources if relevant
- Clearly distinguish between directly stated information and logical inferences
- Flag any contradictions between sources
- Provide supporting citations"""

Handling Irrelevant, Contradictory, or Missing Context

Real RAG systems face messy situations:

from typing import Optional

class RobustRAGPromptBuilder:
    """Build RAG prompts that handle edge cases."""

    @staticmethod
    def build_prompt(question: str,
                    retrieved_docs: list,
                    system_context: str = "") -> tuple:
        """
        Build a RAG prompt and note its quality.

        Returns:
            (prompt, quality_assessment)
        """
        quality = {
            "has_context": len(retrieved_docs) > 0,
            "context_relevance": "unknown",
            "context_conflicts": False,
            "warnings": []
        }

        # Case 1: No documents retrieved
        if not retrieved_docs:
            quality["warnings"].append("No context retrieved")
            return (
                f"The knowledge base has no information about: {question}\n\n"
                f"Based on your general knowledge, how would you answer this?\n"
                f"State any assumptions you're making.",
                quality
            )

        # Case 2: Single document
        if len(retrieved_docs) == 1:
            quality["context_relevance"] = "single_source"
            context = retrieved_docs[0]["content"]
            prompt = f"""Answer this question using ONLY the provided document.

Document: {retrieved_docs[0]['source']}
{context}

Question: {question}

If the document doesn't contain the answer, say so explicitly."""

        # Case 3: Multiple documents - check for conflicts
        else:
            context_snippets = []
            sources_list = []

            for doc in retrieved_docs:
                context_snippets.append(doc["content"])
                sources_list.append(doc["source"])

                # Simple conflict detection: look for contradictory keywords
                if "not" in doc["content"].lower() and \
                   any("yes" in d.get("content", "").lower()
                       for d in retrieved_docs if d != doc):
                    quality["context_conflicts"] = True
                    quality["warnings"].append("Potential conflicts in sources")

            context = "\n\n---\n\n".join(context_snippets)
            sources = ", ".join(sources_list)

            prompt = f"""Answer the question below using the provided sources.

Sources: {sources}

{context}

Question: {question}

Instructions:
1. Use information from the sources above
2. If sources conflict, note the difference and explain
3. Cite the source for each claim
4. If information is not in the sources, clearly state this

Answer:"""

            quality["context_relevance"] = f"multiple_sources_({len(retrieved_docs)})"

        return prompt, quality

# Usage
docs = [
    {
        "source": "Company Policy v2.1",
        "content": "Remote work is allowed 2 days per week"
    },
    {
        "source": "Recent FAQ Update",
        "content": "Due to Q4 initiatives, all staff must be in-office Mon-Wed"
    }
]

prompt, quality = RobustRAGPromptBuilder.build_prompt(
    "How many days per week can I work remotely?",
    docs
)

print(prompt)
print("Quality assessment:", quality)
# Output includes warning about conflicting sources

Citation and Source Attribution

Users trust answers more when sources are cited. Enforce citations in prompts:

class CitationEnforcer:
    """
    Ensure responses include proper citations.
    """

    @staticmethod
    def build_citation_prompt(question: str, documents: list) -> str:
        """
        Build prompt that enforces citations in response.
        """
        doc_list = ""
        for i, doc in enumerate(documents, 1):
            doc_list += f"\n[DOC{i}] {doc['source']}: {doc['content'][:300]}..."

        return f"""Answer the following question based on the provided documents.

Documents:
{doc_list}

Question: {question}

CRITICAL INSTRUCTIONS:
1. Every factual claim must be supported by a citation
2. Use format: "[DOC1: Quote that supports this claim]"
3. If information is not in documents, write "[NOT IN DOCUMENTS]"
4. Never make up sources

Example format:
"The policy was updated in Q4 [DOC2: 'Remote work policy updated October 2024']"

Now answer:"""

    @staticmethod
    def verify_citations(response: str, documents: list) -> dict:
        """
        Check if citations in response actually exist in documents.
        """
        import re

        citation_pattern = r'\[DOC(\d+):'
        found_citations = re.findall(citation_pattern, response)

        valid_citations = []
        missing_citations = []

        for doc_num_str in found_citations:
            doc_num = int(doc_num_str)
            if 1 <= doc_num <= len(documents):
                valid_citations.append(doc_num)
            else:
                missing_citations.append(doc_num)

        return {
            "total_citations": len(found_citations),
            "valid_citations": len(valid_citations),
            "invalid_citations": missing_citations,
            "has_unsupported_claims": len(missing_citations) > 0,
            "citation_coverage": len(valid_citations) / max(len(found_citations), 1)
        }

# Usage
docs = [
    {"source": "Policy A", "content": "Remote work allowed 2 days/week"},
    {"source": "Policy B", "content": "On-site required Mon-Wed for Q4"}
]

prompt = CitationEnforcer.build_citation_prompt(
    "What's the remote work policy?",
    docs
)

# Simulate LLM response
response = """The remote work policy has two versions:
1. Standard policy allows 2 days per week remote [DOC1]
2. Q4 requires on-site Mon-Wed [DOC2]

This creates a temporary change [DOC3]"""  # DOC3 doesn't exist

citation_check = CitationEnforcer.verify_citations(response, docs)
print("Citation verification:", citation_check)
# Output: has_unsupported_claims = True (DOC3 invalid)

Advanced RAG Patterns

Query Rewriting

Sometimes user queries aren’t optimal for retrieval:

class QueryRewriter:
    """Rewrite queries to improve retrieval."""

    @staticmethod
    def build_rewrite_prompt(original_query: str) -> str:
        """Generate prompt to rewrite query."""
        return f"""You are a query optimization expert. Rewrite this question to be clearer for a search engine while preserving the intent.

Original: {original_query}

Rewritten versions (3-5 alternatives):
1. [Alternative 1]
2. [Alternative 2]
...

Format each as a complete question."""

    @staticmethod
    def build_expansion_prompt(original_query: str) -> str:
        """Generate prompt to expand query with related terms."""
        return f"""Expand this query with related terms that might retrieve better documents.

Original: {original_query}

Related queries:
1. [Expansion 1]
2. [Expansion 2]
3. [Expansion 3]

Format each as a complete question."""

Hypothetical Document Embeddings (HyDE)

Generate hypothetical documents to improve retrieval:

class HyDERetriever:
    """
    Use LLM to generate hypothetical documents
    matching the query, then use embeddings to find similar real docs.
    """

    def __init__(self, llm_client, embedding_client, document_store):
        self.llm = llm_client
        self.embeddings = embedding_client
        self.documents = document_store

    def generate_hypothetical_docs(self, query: str, num_docs: int = 3) -> list:
        """Have LLM write hypothetical documents answering the query."""
        prompt = f"""Write {num_docs} realistic documents that would answer this question:

{query}

Each document should be a paragraph (2-3 sentences) that directly answers the question.
Format each as: "DOCUMENT {n}: [content]"
Don't reference being hypothetical."""

        response = self.llm.complete(prompt)
        # Parse response to extract generated documents
        import re
        docs = re.findall(r'DOCUMENT \d+: (.+?)(?=DOCUMENT|\Z)', response, re.DOTALL)
        return docs

    def retrieve_with_hyde(self, query: str) -> list:
        """
        Retrieve by:
        1. Generate hypothetical documents
        2. Embed them
        3. Find real documents similar to hypothetical ones
        """
        hypothetical = self.generate_hypothetical_docs(query)

        # Embed hypothetical documents
        hyp_embeddings = [
            self.embeddings.embed(doc) for doc in hypothetical
        ]

        # Average the embeddings
        import numpy as np
        query_embedding = np.mean(hyp_embeddings, axis=0)

        # Find most similar real documents
        similar_docs = self.documents.find_similar(
            query_embedding,
            top_k=5
        )

        return similar_docs

Building a Complete RAG System

Assemble all pieces into a production RAG system:

from typing import List, Dict
import openai

class RAGSystem:
    """
    Complete RAG system for question answering
    over a document collection.
    """

    def __init__(self, document_store, embedding_model: str = "text-embedding-3-small"):
        self.documents = document_store
        self.embedding_model = embedding_model
        self.citation_enforcer = CitationEnforcer()

    def retrieve_documents(self, query: str, top_k: int = 3) -> List[Dict]:
        """
        Retrieve relevant documents for a query.
        """
        # In real implementation, use embeddings
        # For now, return documents matching keywords
        query_words = set(query.lower().split())
        scored_docs = []

        for doc in self.documents:
            doc_words = set(doc["content"].lower().split())
            overlap = len(query_words & doc_words)
            if overlap > 0:
                scored_docs.append((doc, overlap))

        scored_docs.sort(key=lambda x: x[1], reverse=True)
        return [doc for doc, score in scored_docs[:top_k]]

    def answer_question(self, question: str) -> Dict:
        """
        Answer a question using RAG approach.
        """
        # Retrieve documents
        docs = self.retrieve_documents(question)

        # Build prompt with citations
        prompt = self.citation_enforcer.build_citation_prompt(question, docs)

        # Call LLM
        response = openai.ChatCompletion.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3  # Lower temperature for factuality
        )

        answer = response.choices[0].message.content

        # Verify citations
        citation_check = self.citation_enforcer.verify_citations(answer, docs)

        # Assess response quality
        quality_score = "good" if citation_check["has_unsupported_claims"] == False else "needs_review"

        return {
            "question": question,
            "answer": answer,
            "sources": [doc["source"] for doc in docs],
            "citation_analysis": citation_check,
            "quality": quality_score,
            "retrieved_docs": len(docs)
        }

# Usage
sample_docs = [
    {
        "source": "Employee Handbook 2024",
        "content": "Remote work is allowed 2 days per week under the standard policy."
    },
    {
        "source": "Q4 Special Notice",
        "content": "Due to project requirements, all staff must be in-office Mon-Wed through December."
    },
    {
        "source": "Compensation Policy",
        "content": "Remote work allowance is $50/month for approved remote days."
    }
]

rag = RAGSystem(sample_docs)
result = rag.answer_question("What is the remote work policy and allowance?")

print(f"Question: {result['question']}")
print(f"Answer: {result['answer']}")
print(f"Sources: {result['sources']}")
print(f"Quality: {result['quality']}")

Key Takeaway: RAG effectiveness depends on crafting prompts that clearly use retrieved context while handling edge cases like missing or conflicting information. Always enforce citations to maintain user trust.

Exercise: Build a RAG Prompt System for Knowledge Base Q&A

Create a RAG system that answers questions about a knowledge base:

Retrieve relevant documents by keyword and embedding similarity
Build prompts that handle 3 cases: no context, single source, multiple sources
Enforce citations in responses
Verify that citations actually exist
Generate quality scores for responses

Requirements:

Support 3+ different RAG prompt templates
Handle conflicting information gracefully
Verify all citations match documents
Return structured output with quality metrics
Support query rewriting for better retrieval

Starter code:

class KnowledgeBaseRAG:
    """RAG system for company knowledge base."""

    def __init__(self, kb_documents: list):
        self.documents = kb_documents
        self.builder = RobustRAGPromptBuilder()
        self.enforcer = CitationEnforcer()

    def answer(self, question: str) -> dict:
        """
        Answer question with RAG approach.

        Returns:
            Dict with answer, sources, quality metrics
        """
        # TODO: Retrieve relevant documents
        # TODO: Build RAG prompt (handle edge cases)
        # TODO: Call LLM
        # TODO: Verify citations
        # TODO: Return structured response with quality

        pass

# Load knowledge base
kb = [
    {"source": "FAQ", "content": "..."},
    {"source": "Policies", "content": "..."},
    # ... more documents
]

rag = KnowledgeBaseRAG(kb)
result = rag.answer("How do I request time off?")

Extension challenges:

Implement HyDE for better retrieval
Add query rewriting before retrieval
Support multi-hop questions (question splits into sub-questions)
Track citation accuracy over time
Build a feedback loop to improve document indexing

By completing this exercise, you’ll build systems that combine the power of LLMs with the reliability of knowledge bases.