Model-Specific Optimization Strategies
Model-Specific Optimization Strategies
Introduction
You’ve now learned how to measure prompt quality and test prompts scientifically. But there’s a critical realization: the “best prompt” depends on which model you’re using. A prompt that works beautifully with Claude might confuse GPT-4. A technique that makes Gemini output perfect JSON might break Llama.
This lesson teaches you the most important secret of production prompt engineering: different models have different strengths, quirks, and optimal prompt structures. You’ll learn how to identify these differences, optimize for specific models, and build cross-model solutions.
Key Takeaway: There is no universally perfect prompt. Each model family has unique capabilities and sensitivities. Production systems must either optimize for a specific model or design prompts that work across multiple models, which usually means optimizing for the least capable one.
How Models Differ in Prompt Response
Let’s start with concrete examples:
Example 1: Instruction Clarity
Task: Extract the first name from a full name
Prompt for Claude:
Extract the first name from this name: John Smith
Please return only the first name, nothing else.
Claude: “John” ✓
Same prompt for GPT-3.5: GPT-3.5: “The first name is John” ✗ (Included extra text)
Optimized for GPT-3.5:
Extract the first name from this name: John Smith
Return ONLY the first name in this format:
[FIRST_NAME]
Example:
John Smith -> [John]
GPT-3.5: “[John]” ✓
Example 2: XML vs Natural Language
Claude excels with XML tags:
<instruction>
Extract the sentiment from this review
</instruction>
<review>
This product is amazing!
</review>
<format>
Return JSON with keys: sentiment, confidence
</format>
GPT-4 prefers numbered lists:
1. Your task: Extract sentiment from the review
2. Input review: "This product is amazing!"
3. Format: Return JSON with keys: sentiment, confidence
Llama works better with imperative commands:
TASK: Extract sentiment from this review
REVIEW: "This product is amazing!"
OUTPUT: JSON with sentiment and confidence
Your response:
Understanding Model Capabilities and Limitations
Model Size and Capability Tiers
from enum import Enum
from dataclasses import dataclass
@dataclass
class ModelProfile:
"""Profile of a model's strengths and weaknesses"""
name: str
provider: str
parameter_count: str
context_window: int
strengths: list # What it's good at
weaknesses: list # What it struggles with
optimal_prompt_style: str # How it responds best
cost_per_1k_tokens: float
# Example profiles
MODELS = {
"claude-3-opus": ModelProfile(
name="Claude 3 Opus",
provider="Anthropic",
parameter_count="~100B (estimated)",
context_window=200000,
strengths=[
"Long-form reasoning",
"XML/structured markup",
"Constitutional AI compliance",
"Code generation and analysis"
],
weaknesses=[
"Can be verbose",
"Slower than some competitors"
],
optimal_prompt_style="detailed_with_xml_or_markdown",
cost_per_1k_tokens=0.015
),
"gpt-4": ModelProfile(
name="GPT-4",
provider="OpenAI",
parameter_count="Unknown (estimated >100B)",
context_window=128000,
strengths=[
"Superior reasoning",
"Multimodal (vision)",
"Extremely reliable format following",
"Consistent behavior across domains"
],
weaknesses=[
"More expensive",
"Slower inference"
],
optimal_prompt_style="numbered_lists_and_json_schema",
cost_per_1k_tokens=0.03
),
"llama-2-70b": ModelProfile(
name="Llama 2 70B",
provider="Meta (via API providers)",
parameter_count="70B",
context_window=4096,
strengths=[
"Fast inference",
"Open source (can self-host)",
"Low cost",
"Good at instruction following"
],
weaknesses=[
"Less reliable for complex tasks",
"Smaller context window",
"Struggles with very specific formatting"
],
optimal_prompt_style="clear_imperative_instructions",
cost_per_1k_tokens=0.001
),
"gemini-pro": ModelProfile(
name="Google Gemini Pro",
provider="Google",
parameter_count="Unknown",
context_window=32000,
strengths=[
"Excellent summarization",
"Natural conversation flow",
"Good at following complex logic"
],
weaknesses=[
"Sometimes over-confident",
"Can hallucinate facts"
],
optimal_prompt_style="conversational_narrative",
cost_per_1k_tokens=0.005
)
}
Model-Specific Prompt Optimization
Optimizing for Claude
Claude responds exceptionally well to:
- XML tags for structured input/output
- Detailed explanations of what you want
- Constitutional AI framing (asking models to be harmless, helpful, honest)
def claude_optimized_extraction(text: str) -> dict:
"""Extract data optimally for Claude"""
prompt = """<document>
{text}
</document>
Please extract the following information from the document:
<extraction_task>
- Company name (full legal name)
- Industry (primary classification)
- Founded year (numeric, e.g., 2015)
- Key products (comma-separated list)
</extraction_task>
Return your response as JSON with these exact keys:
- company_name
- industry
- founded_year
- key_products
Be precise and extract only information explicitly stated in the document.
If information is not available, use null for that field."""
response = claude_client.messages.create(
model="claude-3-opus-20240229",
max_tokens=500,
messages=[
{"role": "user", "content": prompt.format(text=text)}
]
)
return json.loads(response.content[0].text)
Optimizing for GPT-4
GPT-4 responds best to:
- JSON Schema for output format specification
- Step-by-step reasoning prompts (chain-of-thought)
- Explicit format examples
def gpt4_optimized_extraction(text: str) -> dict:
"""Extract data optimally for GPT-4"""
prompt = f"""Extract information from this text:
TEXT:
{text}
TASK:
1. Identify the company name (must be exact legal name)
2. Classify the industry (choose from: Technology, Healthcare, Finance, Retail, Other)
3. Find the founding year (format: YYYY)
4. List all products mentioned (format: product1, product2, product3)
OUTPUT FORMAT (STRICT JSON):
{{
"company_name": "string",
"industry": "string",
"founded_year": "integer or null",
"key_products": ["string"],
"confidence": "high/medium/low"
}}
Return ONLY valid JSON. No other text."""
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0
)
return json.loads(response.choices[0].message.content)
Optimizing for Llama
Llama responds best to:
- Clear, concise instructions (it doesn’t need verbose explanations)
- Simpler formats (sometimes struggles with complex JSON)
- Direct imperative commands
def llama_optimized_extraction(text: str) -> dict:
"""Extract data optimally for Llama"""
prompt = f"""Extract information from this text:
{text}
Extract:
COMPANY: [company name]
INDUSTRY: [industry]
YEAR: [founding year]
PRODUCTS: [product list]
Return results in this exact format only."""
response = llama_client.generate(
prompt=prompt,
temperature=0,
max_tokens=200
)
# Parse the response
lines = response.strip().split('\n')
result = {}
for line in lines:
if line.startswith('COMPANY:'):
result['company_name'] = line.replace('COMPANY:', '').strip()
elif line.startswith('INDUSTRY:'):
result['industry'] = line.replace('INDUSTRY:', '').strip()
# ... etc
return result
Cost Optimization Strategies
Different models have vastly different costs. Sometimes using multiple smaller/cheaper calls is better than one expensive call:
Token Efficiency
def estimate_token_cost(prompt: str, output_tokens: int, model: str) -> float:
"""Estimate cost of a single API call"""
token_costs = {
'gpt-4': {'input': 0.03, 'output': 0.06},
'gpt-3.5-turbo': {'input': 0.0005, 'output': 0.0015},
'claude-3-opus': {'input': 0.015, 'output': 0.075},
'claude-3-sonnet': {'input': 0.003, 'output': 0.015},
'llama-2-70b': {'input': 0.001, 'output': 0.001},
}
# Rough token count (better to use actual tokenizer)
input_tokens = len(prompt.split()) * 1.3 # ~1.3 tokens per word
rates = token_costs.get(model, {'input': 0.01, 'output': 0.01})
cost = (input_tokens * rates['input'] + output_tokens * rates['output']) / 1000
return cost
def choose_optimal_model(task: str, budget_per_call: float = 0.01):
"""Choose model that meets quality and budget requirements"""
task_profiles = {
'complex_reasoning': ['gpt-4', 'claude-3-opus'],
'classification': ['gpt-3.5-turbo', 'claude-3-sonnet'],
'simple_extraction': ['llama-2-70b', 'gpt-3.5-turbo'],
'summarization': ['claude-3-sonnet', 'gpt-3.5-turbo']
}
candidates = task_profiles.get(task, ['gpt-3.5-turbo'])
# Filter by budget
affordable = []
for model in candidates:
estimated_cost = estimate_token_cost("", 200, model)
if estimated_cost <= budget_per_call:
affordable.append(model)
return affordable[0] if affordable else candidates[0]
When to Use Smaller Models
def should_use_smaller_model(task: str) -> bool:
"""Determine if a smaller/cheaper model will suffice"""
simple_tasks = [
'sentiment_classification',
'spam_detection',
'language_detection',
'simple_extraction',
'formatting_conversion'
]
complex_tasks = [
'reasoning',
'complex_analysis',
'code_generation',
'creative_writing',
'edge_case_handling'
]
if task in simple_tasks:
return True
if task in complex_tasks:
return False
# Default to larger model if uncertain
return False
# Usage
if should_use_smaller_model('sentiment_classification'):
model = 'gpt-3.5-turbo' # 10x cheaper than GPT-4
else:
model = 'gpt-4' # More capable
Caching and Batching
Reduce API calls and costs with intelligent caching:
import hashlib
from datetime import datetime, timedelta
class PromptCache:
"""Cache prompt responses to avoid redundant API calls"""
def __init__(self, ttl_hours: int = 24):
self.cache = {}
self.ttl = timedelta(hours=ttl_hours)
def _hash_key(self, prompt: str, model: str) -> str:
"""Create cache key from prompt + model"""
combined = f"{prompt}:{model}"
return hashlib.md5(combined.encode()).hexdigest()
def get(self, prompt: str, model: str):
"""Retrieve cached response if exists and not expired"""
key = self._hash_key(prompt, model)
if key in self.cache:
response, timestamp = self.cache[key]
if datetime.now() - timestamp < self.ttl:
return response
else:
del self.cache[key] # Expired
return None
def set(self, prompt: str, model: str, response: str):
"""Cache a response"""
key = self._hash_key(prompt, model)
self.cache[key] = (response, datetime.now())
def save(self, filepath: str):
"""Persist cache to disk"""
import json
cacheable = {
k: (v[0], v[1].isoformat())
for k, v in self.cache.items()
}
with open(filepath, 'w') as f:
json.dump(cacheable, f)
cache = PromptCache()
def call_model_with_cache(prompt: str, model: str, api_client):
"""Call API with caching"""
# Check cache first
cached = cache.get(prompt, model)
if cached:
print("Cache hit!")
return cached
# Call API
response = api_client.call(model, prompt)
# Store in cache
cache.set(prompt, model, response)
return response
Cross-Model Prompt Portability
What if you need a prompt that works across multiple models? The answer is to optimize for the least capable model:
class CrossModelPrompt:
"""Prompt that works across multiple models"""
def __init__(self, base_prompt: str, model_adaptations: dict):
"""
base_prompt: The prompt that works on most models
model_adaptations: {model_name: adjustment_instructions}
"""
self.base_prompt = base_prompt
self.adaptations = model_adaptations
def get_prompt_for_model(self, model: str) -> str:
"""Get prompt optimized for specific model"""
if model in self.adaptations:
return self.base_prompt + "\n" + self.adaptations[model]
return self.base_prompt
# Example
multi_model_extraction = CrossModelPrompt(
base_prompt="""Extract the following from the text:
- Company name
- Founded year
- Industry
Return as structured data.""",
model_adaptations={
'gpt-4': """
IMPORTANT: Return valid JSON format:
{"company_name": "...", "founded_year": 2024, "industry": "..."}""",
'llama-2-70b': """
Format:
COMPANY: [name]
YEAR: [year]
INDUSTRY: [industry]""",
'claude-3': """
<result>
<company_name>...</company_name>
<founded_year>...</founded_year>
<industry>...</industry>
</result>"""
}
)
# Usage
for model in ['gpt-4', 'llama-2-70b', 'claude-3']:
prompt = multi_model_extraction.get_prompt_for_model(model)
response = call_api(model, prompt)
Testing Across Models
When optimizing for multiple models, you need comprehensive testing:
class MultiModelTester:
"""Test prompts across multiple models"""
def __init__(self, models: list, test_cases: list):
self.models = models
self.test_cases = test_cases
self.results = {}
def test_all(self, prompt_variants: dict) -> dict:
"""Test each prompt variant on each model"""
results = {}
for variant_name, prompt_fn in prompt_variants.items():
results[variant_name] = {}
for model in self.models:
model_results = []
for test_case in self.test_cases:
# Get model-specific prompt
prompt = prompt_fn(model)
# Call API
output = self._call_model(model, prompt, test_case['input'])
# Evaluate
score = self._evaluate(output, test_case['expected'])
model_results.append(score)
# Summary for this model
results[variant_name][model] = {
'mean': np.mean(model_results),
'std': np.std(model_results),
'scores': model_results
}
return results
def _call_model(self, model: str, prompt: str, user_input: str) -> str:
"""Call the appropriate API"""
if model.startswith('gpt'):
return self._call_openai(model, prompt, user_input)
elif model.startswith('claude'):
return self._call_anthropic(model, prompt, user_input)
elif model.startswith('llama'):
return self._call_llama(model, prompt, user_input)
def _evaluate(self, output: str, expected: str) -> float:
"""Simple evaluation metric"""
return float(output.strip() == expected.strip())
def summarize_results(self, results: dict) -> dict:
"""Create summary showing best model per variant"""
summary = {}
for variant, model_scores in results.items():
best_model = max(model_scores.items(),
key=lambda x: x[1]['mean'])
summary[variant] = {
'best_model': best_model[0],
'best_score': best_model[1]['mean']
}
return summary
# Usage
tester = MultiModelTester(
models=['gpt-4', 'claude-3-opus', 'llama-2-70b'],
test_cases=[
{'input': 'text1', 'expected': 'output1'},
{'input': 'text2', 'expected': 'output2'},
]
)
results = tester.test_all({
'simple_prompt': lambda m: "Extract data: {input}",
'detailed_prompt': lambda m: f"Extract data (optimized for {m}): {input}",
'structured_prompt': multi_model_extraction.get_prompt_for_model
})
summary = tester.summarize_results(results)
print(summary)
Exercise: Optimize for Multiple Models
Choose a task (e.g., “Extract sentiment from product reviews”) and:
- Write a base prompt that works decently across models
- Create model-specific adaptations for at least 3 models (GPT-4, Claude, Llama)
- Build a test set of 20+ test cases
- Test each model+prompt combination
- Analyze:
- Which model performs best overall?
- Which model is cheapest while maintaining >80% accuracy?
- Which prompt adaptation helps the most?
Deliverables:
- Three prompt variants (one for each model)
- Test results showing accuracy per model
- A cost analysis (cost per correct classification)
- A recommendation for production: which model and prompt would you use and why?
Summary
In this lesson, you’ve learned:
- Different models have different strengths and respond better to different prompt structures
- How to profile and understand model capabilities
- Specific optimization techniques for Claude, GPT-4, Llama, and Gemini
- How to optimize for cost: choosing the right model tier for the task
- Techniques for caching and batching to reduce costs
- How to build prompts that work across multiple models
- Testing frameworks for comparing performance across models
Next, you’ll learn how to manage prompts in production: versioning, tracking, and monitoring.