Chain-of-Thought and Step-by-Step Reasoning
Chain-of-Thought and Step-by-Step Reasoning
Sometimes the most powerful thing you can do is ask the model to show its work. Chain-of-Thought (CoT) prompting is a technique where you explicitly ask the model to reason through a problem step-by-step before giving a final answer. This simple change dramatically improves performance on complex reasoning tasks. By forcing the model to articulate intermediate reasoning steps, you’re actually leveraging the model’s ability to work through problems methodically—just like you would on paper.
What Is Chain-of-Thought Prompting?
Chain-of-Thought prompting is asking the model to explain its reasoning process before providing a final answer. Instead of jumping directly to a conclusion, the model traces through the logic, breaking the problem into manageable pieces.
Why Chain-of-Thought Works
LLMs are fundamentally sequential token predictors. When forced to generate intermediate reasoning steps, they:
- Allocate more computational capacity to the problem
- Create a “record” of their thinking that helps guide the final answer
- Reduce compounding errors by catching mistakes mid-reasoning
- Self-correct when they notice their logic doesn’t work
Think of it like the difference between a student who writes an answer on the spot versus one who works through it on paper. The paper trail helps catch errors.
Simple Example: With and Without CoT
Without Chain-of-Thought:
Question: If a rectangle has a width of 8 and a perimeter of 40, what's the height?
Direct answer: Height is 12
(Correct, but the model got lucky. If the question were more complex,
direct answers often fail.)
With Chain-of-Thought:
Question: If a rectangle has a width of 8 and a perimeter of 40, what's the height?
Let me think through this step by step:
1. The perimeter of a rectangle is: P = 2(width + height)
2. We know: P = 40 and width = 8
3. So: 40 = 2(8 + height)
4. Simplify: 40 = 16 + 2·height
5. Solve: 24 = 2·height
6. Therefore: height = 12
Answer: 12
Same answer, but the model showed its work. For harder problems, this transparency helps the model reason correctly.
Zero-Shot Chain-of-Thought: “Let’s Think Step by Step”
The simplest form of CoT prompting requires almost no setup. Just ask the model to think step-by-step.
The Magic Phrase
The phrase “Let’s think step by step” (or similar variations) is surprisingly effective:
Question: Roger has 5 tennis balls. He buys 3 more. How many does he have?
Let's think step by step:
The model then generates step-by-step reasoning before answering.
Real-World Example: Zero-Shot CoT
Problem: A store sells apples for $0.50 each and oranges for $0.75 each.
If I buy 4 apples and 3 oranges, and I pay with a $10 bill, how much change do I get?
Let's think step by step:
Step 1: Calculate the cost of apples
- 4 apples × $0.50 = $2.00
Step 2: Calculate the cost of oranges
- 3 oranges × $0.75 = $2.25
Step 3: Calculate total cost
- $2.00 + $2.25 = $4.25
Step 4: Calculate change
- $10.00 - $4.25 = $5.75
Answer: I get $5.75 in change
Variations of Zero-Shot CoT
Different phrasings work similarly well:
"Let me think through this carefully..."
"Let's break this down step by step..."
"I need to work through this logically..."
"Let me reason through this..."
All of these trigger the same behavior: explicit step-by-step reasoning.
When Zero-Shot CoT Helps Most
Zero-shot CoT is most effective for:
- Math and arithmetic (calculations with multiple steps)
- Logic puzzles (requires formal reasoning)
- Multi-step processes (instructions that build on each other)
- Cause-and-effect reasoning (understanding why something happens)
For simple factual questions (“What is the capital of France?”), CoT doesn’t add value.
Few-Shot Chain-of-Thought: Providing Reasoning Examples
While zero-shot CoT is powerful, you can boost performance further by providing examples of good reasoning.
Few-Shot CoT Pattern
Show 1-3 examples of problems with step-by-step reasoning, then ask the model to do the same for a new problem:
Example 1:
Question: Mary has twice as many books as John. John has 3 books.
How many books do they have together?
Solution:
- John has 3 books
- Mary has twice as many: 2 × 3 = 6 books
- Together: 3 + 6 = 9 books
Example 2:
Question: A train travels at 60 mph for 2 hours, then 80 mph for 1 hour.
What's the total distance?
Solution:
- First leg: 60 mph × 2 hours = 120 miles
- Second leg: 80 mph × 1 hour = 80 miles
- Total distance: 120 + 80 = 200 miles
Now solve this problem the same way:
Question: A recipe calls for 2 cups of flour and 1.5 cups of sugar.
If I want to make half the recipe, how much of each ingredient do I need?
Key Elements of Effective Few-Shot CoT
- Clear structure: Each example follows the same format
- Intermediate steps: Show calculations or reasoning clearly
- Explicit reasoning: Not just numbers, but logic
- Relevant examples: Problems similar in complexity to your target problem
Real-World Example: Few-Shot CoT for Code Analysis
You are analyzing code for bugs. For each code snippet, explain the bug
step-by-step, then provide a fixed version.
Example 1:
Buggy Code:
```python
def sum_list(items):
total = 0
for i in range(len(items)):
total = items[i] # BUG HERE
return total
Analysis:
- The code iterates through the list with a for loop
- For each iteration, it assigns items[i] to total
- BUG: It uses = (assignment) instead of += (addition)
- This means total only holds the last item, not the sum
- Fix: Change total = items[i] to total += items[i]
Fixed Code:
def sum_list(items):
total = 0
for i in range(len(items)):
total += items[i] # FIXED
return total
Example 2: Buggy Code:
def find_max(items):
max_val = items[0]
for item in items:
if item > max_val:
max_val = item
return max_val
Analysis:
- The code starts with the first item as max_val
- It loops through all items and compares to max_val
- When an item is larger, it updates max_val
- This logic is correct!
- Actually, there’s NO BUG in this code.
Verdict: No bugs found. This code correctly finds the maximum value.
Now analyze this code: Buggy Code:
def remove_duplicates(items):
result = []
for item in items:
if item not in result:
result.append(item)
return result
Analysis:
The model would then provide step-by-step analysis following the pattern.
## CoT Variations: Tree-of-Thought and Self-Consistency
While basic CoT is powerful, researchers have developed variations that work even better for complex problems.
### Tree-of-Thought (ToT)
Instead of one linear reasoning path, Tree-of-Thought explores multiple paths and picks the best one.
**Linear CoT:**
Question → Step 1 → Step 2 → Step 3 → Answer (Single path)
**Tree-of-Thought:**
Question
|
__________|__________
| | |
Path A Path B Path C
/ \ / \ |
A1 A2 B1 B2 C1
| | | | |
✓ ✗ ✗ ✓ ✗
Model explores multiple paths and picks valid ones.
**How to use ToT in practice:**
Question: I have a 5-liter bottle and a 3-liter bottle. How do I measure exactly 4 liters of water?
Generate 3 different approaches to solve this problem, evaluate each, then choose the best one.
Approach 1:
- Fill the 5-liter bottle
- Pour from it into the 3-liter bottle
- Now the 5-liter bottle has 2 liters
- … evaluate this path …
- Does this work? Evaluate.
Approach 2:
- Fill the 3-liter bottle
- Pour it into the 5-liter bottle
- … evaluate this path …
- Does this work? Evaluate.
Approach 3:
- … generate a third approach …
Which approach works? Why?
### Self-Consistency
Generate multiple reasoning paths and take a vote on the answer (majority wins).
Question: What’s the value of (2 + 3) × 4 - 1?
Generate 3 different reasoning paths:
Path 1: [Reasoning] → Answer: 19 Path 2: [Reasoning] → Answer: 19 Path 3: [Reasoning] → Answer: 19
Majority answer: 19 Confidence: High (3/3 paths agree)
**When to use self-consistency:**
- When correct answers are rare in the probability distribution
- For complex reasoning where different paths might lead to different answers
- When you need higher confidence
```python
def self_consistency_prompt(question, num_paths=3):
"""Generate multiple reasoning paths and vote on answer"""
prompt = f"""Answer this question {num_paths} different ways.
For each approach, show your step-by-step reasoning.
Question: {question}
Path 1:
[Reasoning]
Answer:
Path 2:
[Reasoning]
Answer:
Path 3:
[Reasoning]
Answer:
Final Answer (most common among the paths):
"""
return prompt
Code Examples: Applying CoT to Different Problem Types
Let’s see CoT in action for different kinds of problems:
Example 1: Math Problem with CoT
import anthropic
client = anthropic.Anthropic()
math_problem = """
A company has 500 employees. In the last quarter:
- 10% of employees left the company
- 40 new employees were hired
- How many employees does the company have now?
Let's solve this step by step:
"""
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=300,
messages=[
{"role": "user", "content": math_problem}
]
)
print(response.content[0].text)
# Output will show step-by-step calculation
Example 2: Logic Problem with Few-Shot CoT
logic_problem_with_examples = """
You will solve logic puzzles step-by-step.
Example 1:
Puzzle: Three friends (Alice, Bob, Carol) have different favorite colors.
- Alice doesn't like red
- Bob likes blue
- Carol doesn't like blue
What colors do they like?
Solution:
1. Bob likes blue (given)
2. Carol doesn't like blue (given)
3. So Carol likes either red or green
4. Alice doesn't like red (given)
5. Alice likes either blue or green
6. But Bob already has blue
7. So Alice must like green
8. That leaves red for Carol
Assignments: Alice=green, Bob=blue, Carol=red
Now solve this:
Puzzle: Four people (Alex, Bailey, Casey, Dana) each own a different pet.
- Alex doesn't own a dog
- Bailey owns either a cat or bird
- Casey owns a fish
- Dana doesn't own a cat
If everyone owns a different pet, who owns what?
"""
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=400,
messages=[
{"role": "user", "content": logic_problem_with_examples}
]
)
print(response.content[0].text)
Example 3: Multi-Step Decision with CoT
decision_problem = """
You're recommending a database technology for a new project.
The project has these requirements:
- 1 million users
- Real-time analytics (queries need <1 second response)
- Global distribution needed
- Budget: $10,000/month
Let's think through the options step by step:
Option 1: Traditional SQL (PostgreSQL)
- Pros:
- Cons:
- Suitable? Why or why not?
Option 2: NoSQL (MongoDB)
- Pros:
- Cons:
- Suitable? Why or why not?
Option 3: Data Warehouse (BigQuery)
- Pros:
- Cons:
- Suitable? Why or why not?
Final recommendation:
Based on the analysis above, recommend the best option for this project
and explain why it meets the requirements.
"""
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=800,
messages=[
{"role": "user", "content": decision_problem}
]
)
print(response.content[0].text)
When CoT Helps Most vs. When It Doesn’t
| Problem Type | CoT Helps | Why |
|---|---|---|
| Math/arithmetic | YES | Multiple steps, compounding error risk |
| Logic puzzles | YES | Requires methodical reasoning |
| Multi-step instructions | YES | Breaking down complex tasks |
| Code debugging | YES | Finding bugs requires systematic analysis |
| Fact recall | NO | ”What is X?” doesn’t need step-by-step |
| Simple classification | MAYBE | Depends on difficulty |
| Creative writing | NO | Reasoning steps don’t improve creativity |
| Translation | NO | Doesn’t benefit from explicit steps |
| Complex strategic decisions | YES | Multiple factors to weigh systematically |
Key Takeaway
Chain-of-Thought prompting asks the model to show its work step-by-step before giving a final answer. Zero-shot CoT (“Let’s think step by step”) works surprisingly well for complex reasoning. Few-shot CoT provides examples of good reasoning to improve performance further. Variations like Tree-of-Thought and self-consistency can handle even harder problems. CoT is most effective for math, logic, debugging, and multi-step analysis.
Exercise: Apply CoT to a Multi-Step Business Analysis
Your task is to solve a complex business problem using CoT prompting.
The Problem
You work at a software company. Your VP of Sales comes to you with this question:
“Should we raise our SaaS product pricing from $99/month to $149/month? We currently have 500 customers. Early market research suggests that at $149/month, we’d lose 15% of our customers, but revenue per remaining customer would increase. Help me decide.”
Your Task
-
Create a CoT prompt that breaks down this decision into step-by-step reasoning
-
Include these elements:
- Define what we need to calculate (current revenue, new revenue, customer impact)
- Show calculations for current state
- Calculate the impact of the price increase
- Quantify the trade-off (revenue gained vs. customers lost)
- Make a recommendation based on analysis
-
Structure your prompt with explicit steps:
Let's work through this pricing decision systematically: Step 1: Calculate current monthly revenue [guidance on what to calculate] Step 2: Calculate impact of price increase [guidance on what to calculate] Step 3: Calculate revenue after customer loss [guidance on what to calculate] Step 4: Weigh the trade-offs [guidance on analysis] Step 5: Recommendation [what to recommend] -
Test your prompt on an LLM (optional but recommended)
-
Document your reasoning:
- Why did you structure it this way?
- What calculations did you emphasize?
- What additional factors would a real decision consider?
Bonus Challenge
Create two versions of this prompt:
- Version A (Conservative): Emphasizes customer retention risk
- Version B (Growth): Emphasizes revenue optimization
Run both versions and compare how the CoT reasoning differs based on the framing. This shows how CoT doesn’t eliminate bias—it makes your assumptions visible.