Chain-of-Thought and Step-by-Step Reasoning

Sometimes the most powerful thing you can do is ask the model to show its work. Chain-of-Thought (CoT) prompting is a technique where you explicitly ask the model to reason through a problem step-by-step before giving a final answer. This simple change dramatically improves performance on complex reasoning tasks. By forcing the model to articulate intermediate reasoning steps, you’re actually leveraging the model’s ability to work through problems methodically—just like you would on paper.

What Is Chain-of-Thought Prompting?

Chain-of-Thought prompting is asking the model to explain its reasoning process before providing a final answer. Instead of jumping directly to a conclusion, the model traces through the logic, breaking the problem into manageable pieces.

Why Chain-of-Thought Works

LLMs are fundamentally sequential token predictors. When forced to generate intermediate reasoning steps, they:

Allocate more computational capacity to the problem
Create a “record” of their thinking that helps guide the final answer
Reduce compounding errors by catching mistakes mid-reasoning
Self-correct when they notice their logic doesn’t work

Think of it like the difference between a student who writes an answer on the spot versus one who works through it on paper. The paper trail helps catch errors.

Simple Example: With and Without CoT

Without Chain-of-Thought:

Question: If a rectangle has a width of 8 and a perimeter of 40, what's the height?

Direct answer: Height is 12

(Correct, but the model got lucky. If the question were more complex,
direct answers often fail.)

With Chain-of-Thought:

Question: If a rectangle has a width of 8 and a perimeter of 40, what's the height?

Let me think through this step by step:
1. The perimeter of a rectangle is: P = 2(width + height)
2. We know: P = 40 and width = 8
3. So: 40 = 2(8 + height)
4. Simplify: 40 = 16 + 2·height
5. Solve: 24 = 2·height
6. Therefore: height = 12

Answer: 12

Same answer, but the model showed its work. For harder problems, this transparency helps the model reason correctly.

Zero-Shot Chain-of-Thought: “Let’s Think Step by Step”

The simplest form of CoT prompting requires almost no setup. Just ask the model to think step-by-step.

The Magic Phrase

The phrase “Let’s think step by step” (or similar variations) is surprisingly effective:

Question: Roger has 5 tennis balls. He buys 3 more. How many does he have?

Let's think step by step:

The model then generates step-by-step reasoning before answering.

Real-World Example: Zero-Shot CoT

Problem: A store sells apples for $0.50 each and oranges for $0.75 each.
If I buy 4 apples and 3 oranges, and I pay with a $10 bill, how much change do I get?

Let's think step by step:

Step 1: Calculate the cost of apples
- 4 apples × $0.50 = $2.00

Step 2: Calculate the cost of oranges
- 3 oranges × $0.75 = $2.25

Step 3: Calculate total cost
- $2.00 + $2.25 = $4.25

Step 4: Calculate change
- $10.00 - $4.25 = $5.75

Answer: I get $5.75 in change

Variations of Zero-Shot CoT

Different phrasings work similarly well:

"Let me think through this carefully..."
"Let's break this down step by step..."
"I need to work through this logically..."
"Let me reason through this..."

All of these trigger the same behavior: explicit step-by-step reasoning.

When Zero-Shot CoT Helps Most

Zero-shot CoT is most effective for:

Math and arithmetic (calculations with multiple steps)
Logic puzzles (requires formal reasoning)
Multi-step processes (instructions that build on each other)
Cause-and-effect reasoning (understanding why something happens)

For simple factual questions (“What is the capital of France?”), CoT doesn’t add value.

Few-Shot Chain-of-Thought: Providing Reasoning Examples

While zero-shot CoT is powerful, you can boost performance further by providing examples of good reasoning.

Few-Shot CoT Pattern

Show 1-3 examples of problems with step-by-step reasoning, then ask the model to do the same for a new problem:

Example 1:
Question: Mary has twice as many books as John. John has 3 books.
How many books do they have together?

Solution:
- John has 3 books
- Mary has twice as many: 2 × 3 = 6 books
- Together: 3 + 6 = 9 books

Example 2:
Question: A train travels at 60 mph for 2 hours, then 80 mph for 1 hour.
What's the total distance?

Solution:
- First leg: 60 mph × 2 hours = 120 miles
- Second leg: 80 mph × 1 hour = 80 miles
- Total distance: 120 + 80 = 200 miles

Now solve this problem the same way:
Question: A recipe calls for 2 cups of flour and 1.5 cups of sugar.
If I want to make half the recipe, how much of each ingredient do I need?

Key Elements of Effective Few-Shot CoT

Clear structure: Each example follows the same format
Intermediate steps: Show calculations or reasoning clearly
Explicit reasoning: Not just numbers, but logic
Relevant examples: Problems similar in complexity to your target problem

Real-World Example: Few-Shot CoT for Code Analysis

You are analyzing code for bugs. For each code snippet, explain the bug
step-by-step, then provide a fixed version.

Example 1:
Buggy Code:
```python
def sum_list(items):
    total = 0
    for i in range(len(items)):
        total = items[i]  # BUG HERE
    return total

Analysis:

The code iterates through the list with a for loop
For each iteration, it assigns items[i] to total
BUG: It uses = (assignment) instead of += (addition)
This means total only holds the last item, not the sum
Fix: Change total = items[i] to total += items[i]

Fixed Code:

def sum_list(items):
    total = 0
    for i in range(len(items)):
        total += items[i]  # FIXED
    return total

Example 2: Buggy Code:

def find_max(items):
    max_val = items[0]
    for item in items:
        if item > max_val:
            max_val = item
    return max_val

Analysis:

The code starts with the first item as max_val
It loops through all items and compares to max_val
When an item is larger, it updates max_val
This logic is correct!
Actually, there’s NO BUG in this code.

Verdict: No bugs found. This code correctly finds the maximum value.

Now analyze this code: Buggy Code:

def remove_duplicates(items):
    result = []
    for item in items:
        if item not in result:
            result.append(item)
    return result

Analysis:


The model would then provide step-by-step analysis following the pattern.

## CoT Variations: Tree-of-Thought and Self-Consistency

While basic CoT is powerful, researchers have developed variations that work even better for complex problems.

### Tree-of-Thought (ToT)

Instead of one linear reasoning path, Tree-of-Thought explores multiple paths and picks the best one.

**Linear CoT:**

Question → Step 1 → Step 2 → Step 3 → Answer (Single path)


**Tree-of-Thought:**

                Question
                   |
         __________|__________
        |           |          |
     Path A      Path B    Path C
     /   \       /   \       |
  A1   A2     B1    B2      C1
   |    |      |     |       |
  ✓   ✗      ✗    ✓      ✗

Model explores multiple paths and picks valid ones.


**How to use ToT in practice:**

Question: I have a 5-liter bottle and a 3-liter bottle. How do I measure exactly 4 liters of water?

Generate 3 different approaches to solve this problem, evaluate each, then choose the best one.

Approach 1:

Fill the 5-liter bottle
Pour from it into the 3-liter bottle
Now the 5-liter bottle has 2 liters
… evaluate this path …
Does this work? Evaluate.

Approach 2:

Fill the 3-liter bottle
Pour it into the 5-liter bottle
… evaluate this path …
Does this work? Evaluate.

Approach 3:

… generate a third approach …

Which approach works? Why?


### Self-Consistency

Generate multiple reasoning paths and take a vote on the answer (majority wins).

Question: What’s the value of (2 + 3) × 4 - 1?

Generate 3 different reasoning paths:

Path 1: [Reasoning] → Answer: 19 Path 2: [Reasoning] → Answer: 19 Path 3: [Reasoning] → Answer: 19

Majority answer: 19 Confidence: High (3/3 paths agree)


**When to use self-consistency:**
- When correct answers are rare in the probability distribution
- For complex reasoning where different paths might lead to different answers
- When you need higher confidence

```python
def self_consistency_prompt(question, num_paths=3):
    """Generate multiple reasoning paths and vote on answer"""
    prompt = f"""Answer this question {num_paths} different ways.
For each approach, show your step-by-step reasoning.

Question: {question}

Path 1:
[Reasoning]
Answer:

Path 2:
[Reasoning]
Answer:

Path 3:
[Reasoning]
Answer:

Final Answer (most common among the paths):
"""
    return prompt

Code Examples: Applying CoT to Different Problem Types

Let’s see CoT in action for different kinds of problems:

Example 1: Math Problem with CoT

import anthropic

client = anthropic.Anthropic()

math_problem = """
A company has 500 employees. In the last quarter:
- 10% of employees left the company
- 40 new employees were hired
- How many employees does the company have now?

Let's solve this step by step:
"""

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=300,
    messages=[
        {"role": "user", "content": math_problem}
    ]
)

print(response.content[0].text)
# Output will show step-by-step calculation

Example 2: Logic Problem with Few-Shot CoT

logic_problem_with_examples = """
You will solve logic puzzles step-by-step.

Example 1:
Puzzle: Three friends (Alice, Bob, Carol) have different favorite colors.
- Alice doesn't like red
- Bob likes blue
- Carol doesn't like blue
What colors do they like?

Solution:
1. Bob likes blue (given)
2. Carol doesn't like blue (given)
3. So Carol likes either red or green
4. Alice doesn't like red (given)
5. Alice likes either blue or green
6. But Bob already has blue
7. So Alice must like green
8. That leaves red for Carol

Assignments: Alice=green, Bob=blue, Carol=red

Now solve this:
Puzzle: Four people (Alex, Bailey, Casey, Dana) each own a different pet.
- Alex doesn't own a dog
- Bailey owns either a cat or bird
- Casey owns a fish
- Dana doesn't own a cat
If everyone owns a different pet, who owns what?
"""

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=400,
    messages=[
        {"role": "user", "content": logic_problem_with_examples}
    ]
)

print(response.content[0].text)

Example 3: Multi-Step Decision with CoT

decision_problem = """
You're recommending a database technology for a new project.
The project has these requirements:
- 1 million users
- Real-time analytics (queries need <1 second response)
- Global distribution needed
- Budget: $10,000/month

Let's think through the options step by step:

Option 1: Traditional SQL (PostgreSQL)
- Pros:
- Cons:
- Suitable? Why or why not?

Option 2: NoSQL (MongoDB)
- Pros:
- Cons:
- Suitable? Why or why not?

Option 3: Data Warehouse (BigQuery)
- Pros:
- Cons:
- Suitable? Why or why not?

Final recommendation:
Based on the analysis above, recommend the best option for this project
and explain why it meets the requirements.
"""

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=800,
    messages=[
        {"role": "user", "content": decision_problem}
    ]
)

print(response.content[0].text)

When CoT Helps Most vs. When It Doesn’t

Problem Type	CoT Helps	Why
Math/arithmetic	YES	Multiple steps, compounding error risk
Logic puzzles	YES	Requires methodical reasoning
Multi-step instructions	YES	Breaking down complex tasks
Code debugging	YES	Finding bugs requires systematic analysis
Fact recall	NO	”What is X?” doesn’t need step-by-step
Simple classification	MAYBE	Depends on difficulty
Creative writing	NO	Reasoning steps don’t improve creativity
Translation	NO	Doesn’t benefit from explicit steps
Complex strategic decisions	YES	Multiple factors to weigh systematically

Key Takeaway

Chain-of-Thought prompting asks the model to show its work step-by-step before giving a final answer. Zero-shot CoT (“Let’s think step by step”) works surprisingly well for complex reasoning. Few-shot CoT provides examples of good reasoning to improve performance further. Variations like Tree-of-Thought and self-consistency can handle even harder problems. CoT is most effective for math, logic, debugging, and multi-step analysis.

Exercise: Apply CoT to a Multi-Step Business Analysis

Your task is to solve a complex business problem using CoT prompting.

The Problem

You work at a software company. Your VP of Sales comes to you with this question:

“Should we raise our SaaS product pricing from $99/month to $149/month? We currently have 500 customers. Early market research suggests that at $149/month, we’d lose 15% of our customers, but revenue per remaining customer would increase. Help me decide.”

Your Task

Create a CoT prompt that breaks down this decision into step-by-step reasoning
Include these elements:
- Define what we need to calculate (current revenue, new revenue, customer impact)
- Show calculations for current state
- Calculate the impact of the price increase
- Quantify the trade-off (revenue gained vs. customers lost)
- Make a recommendation based on analysis

Structure your prompt with explicit steps:

Let's work through this pricing decision systematically:

Step 1: Calculate current monthly revenue
[guidance on what to calculate]

Step 2: Calculate impact of price increase
[guidance on what to calculate]

Step 3: Calculate revenue after customer loss
[guidance on what to calculate]

Step 4: Weigh the trade-offs
[guidance on analysis]

Step 5: Recommendation
[what to recommend]

Test your prompt on an LLM (optional but recommended)
Document your reasoning:
- Why did you structure it this way?
- What calculations did you emphasize?
- What additional factors would a real decision consider?

Bonus Challenge

Create two versions of this prompt:

Version A (Conservative): Emphasizes customer retention risk
Version B (Growth): Emphasizes revenue optimization

Run both versions and compare how the CoT reasoning differs based on the framing. This shows how CoT doesn’t eliminate bias—it makes your assumptions visible.