Understanding LLMs and How They Process Prompts

When you send a prompt to a large language model, you’re engaging with one of the most sophisticated pattern-recognition systems ever built. But what’s actually happening under the hood? Understanding the mechanics of how LLMs work is foundational to writing effective prompts. You’ll write better instructions once you know how the model interprets them.

What Is a Large Language Model?

A Large Language Model (LLM) is a neural network trained on billions of examples of text from the internet, books, academic papers, and other sources. Its core capability is deceptively simple: predict the next token (a small unit of text) based on all previous tokens.

Think of it like this: you’ve seen thousands of sentences that start with “The sky is…” and the model has learned that “blue” follows that phrase far more often than “red” or “purple.” But the model doesn’t memorize; it learns statistical patterns about how language works. When you give it a new context it’s never seen before, it generates text by repeatedly asking itself: “Given everything before this point, what should come next?”

Tokens: The Basic Unit of Language

The model doesn’t process words—it processes tokens. A token is typically a small chunk of text: a word, part of a word, or punctuation. The word “understand” might be one token, but “ununderstandable” might be three tokens. This matters because:

Tokens cost money in API usage
Tokens fill your context window (the amount of previous text the model can “see”)
Token distribution matters for how the model understands your input

Here’s an example of tokenization:

Input: "Hello, how are you?"
Tokens: [Hello] [,] [how] [are] [you] [?]

More complex:

Input: "Metamorphosis is fascinating."
Tokens: [Meta] [mor] [phosis] [is] [fascinating] [.]

The exact tokenization depends on which model you’re using. GPT models use their own tokenizer, while other models use different schemes.

Context Windows: The Model’s Memory Limit

Every LLM has a context window—a limit to how much previous text it can consider. Think of it as short-term memory. GPT-4 has a context window of 8,192 tokens (or more in extended versions). Claude 3 has 200,000 tokens. This means:

The model can’t “remember” conversations longer than the context window
If you include a 10,000-word document in your prompt and the window is 8,000 tokens, the model misses part of it
Every prompt consumes tokens from this window

When planning prompts with large documents, consider: “Do I have enough context window space for the document + my question + the answer?”

Temperature: Controlling Randomness

When the model predicts the next token, it doesn’t always pick the single most likely option. Instead, it generates a probability distribution:

Given context "The capital of France is":
- Paris: 95%
- France: 3%
- London: 1%
- banana: 0.001%

The temperature parameter controls how strictly the model follows these probabilities:

Temperature = 0.0 (Deterministic): Always pick the highest probability. Perfect for tasks where you need consistency.
Temperature = 0.5 (Low randomness): Pick from the most likely options with some variation. Good for most tasks.
Temperature = 1.0 (Default): Standard probability sampling. Balanced between determinism and creativity.
Temperature = 2.0 (High randomness): Consider even low-probability options. Good for creative brainstorming.

For a factual question like “What is 2+2?”, use low temperature. For creative writing, higher temperature encourages variety.

How Prompts Become Output: The Generation Pipeline

Here’s the journey your prompt takes through the model:

Stage 1: Tokenization

Your prompt text is split into tokens. “Write a poem about cats” becomes approximately [Write] [a] [poem] [about] [cats].

Stage 2: Embedding and Context

Each token is converted into a numerical representation (embedding) that captures its meaning. The model’s attention mechanism then processes these representations, with each token learning about all other tokens in the context.

This is why prompt order matters. The model weighs the relationship between all tokens, so:

“Cats are cute” and “Cute are cats” will generate very different outputs
Information early in your prompt gets similar importance to information late in your prompt
But the recency effect still applies slightly—the most recent tokens have slightly more influence

Stage 3: Token Prediction Loop

The model repeatedly predicts the next most appropriate token:

Input: "Write a haiku about coffee:"
Model predicts: [Morning] (most likely next word)
Context becomes: "Write a haiku about coffee: Morning"
Model predicts: [light] (most likely next word given the full context)
Context becomes: "Write a haiku about coffee: Morning light"
... continues until it predicts a stop token or reaches a length limit

This is why LLMs can ramble if not constrained—they keep generating because it’s “valid” to do so.

Stage 4: Output Formatting

The raw token stream is decoded back into readable text.

Why Prompt Design Matters: Garbage In, Garbage Out

The quality of your prompt directly correlates with the quality of output. Here’s why:

Vague Prompt → Vague Output

Prompt: "Tell me about dogs"
Output: [Generic information about dogs, breeds, care]

The model has infinite valid interpretations. Without specificity, it defaults to generic, median-quality responses.

Specific Prompt → Precise Output

Prompt: "Write a 100-word technical guide for dog owners about nutrition,
focusing on protein requirements for active breeds. Include a warning about
grain allergies."
Output: [Precise, targeted content that matches your exact needs]

This specificity works because:

The model has fewer valid interpretations to choose from
Your instructions act as a filter, narrowing the probability distribution toward what you want
The model can evaluate its own output against your clear criteria

The Anatomy of a Prompt: Four Components

An effective prompt typically contains:

1. Instruction

The action you want the model to take. Clear verbs matter:

“Write” vs. “Describe” vs. “Generate” vs. “Summarize” = different emphasis
“Analyze this code for bugs” is an instruction
“What’s this code?” is vague

2. Context

Background information that helps the model understand the domain, constraints, or desired output style:

Context: "You are a technical writer specializing in cloud infrastructure"
Context: "The audience is non-technical business stakeholders"
Context: "The document should be compliant with HIPAA regulations"

Context primes the model’s knowledge distribution toward the relevant parts of its training data.

3. Input

The actual data or question you want the model to process:

Input: [User's code snippet]
Input: [Article to analyze]
Input: [Customer feedback to classify]

4. Output Format

How you want the result structured:

Output format: "JSON with fields: title, summary, action_items"
Output format: "A single paragraph, no more than 150 words"
Output format: "Python code with docstrings"

Here’s a complete example showing all four components:

INSTRUCTION: Analyze the following code for security vulnerabilities.

CONTEXT: You are a senior security engineer. The code is part of a web
application that processes user authentication. Flag only high-severity issues.

INPUT:
```python
def login(username, password):
    query = f"SELECT * FROM users WHERE username='{username}' AND password='{password}'"
    result = db.execute(query)
    return result[0] if result else None

OUTPUT FORMAT: Provide your response as:

A list of vulnerabilities (name, severity, description)
A corrected code snippet
One sentence explaining the fix


## How Different Prompt Structures Affect Output

The same information structured differently produces different results:

### Approach 1: Instruction First

Prompt: “Explain how trees grow. Context: For a 10-year-old child. Provide exactly 3 paragraphs. Use simple words.”


Result: Simple, age-appropriate explanation.

### Approach 2: Context First

Prompt: “For a 10-year-old child, using simple words and exactly 3 paragraphs. Now explain: How do trees grow?”


Result: Similar, but the model processes the constraints before seeing the task.

### Approach 3: Few-Shot (Examples First)

Prompt: “Here’s how we explain things for 10-year-olds:

Example: How do birds fly? Birds have wings with special feathers. The feathers are shaped to catch air. When they flap their wings fast, the air pushes them up into the sky.

Now use the same style to answer: How do trees grow?”


Result: Matches the style of the example more closely.

Each structure works for different situations, and you'll learn to choose strategically as you advance.

## Key Takeaway

> Large Language Models work by predicting the next token based on statistical patterns learned during training. Your prompt is a filter that narrows the probability distribution toward your desired output. The clearer and more specific your filter (prompt), the better the results. Understanding tokens, context windows, and temperature gives you the tools to shape outputs precisely.

## Exercise: Observe the Same Question Asked Three Ways

You'll now run the same question through three different prompt formulations and observe the differences.

### Your Task

Ask an LLM (ChatGPT, Claude, or similar): "What makes a good team?" using each of these prompts:

**Version 1 (Vague):**

What makes a good team?


**Version 2 (Specific):**

I run a software engineering team of 6 people. We’re working on a critical product launch in 4 weeks. Based on research in organizational psychology, what are the 5 most important attributes for our team to have right now? Format as a numbered list with a 2-sentence explanation for each.


**Version 3 (Role + Structure):**

You are an executive coach specializing in tech teams. A software startup founder asks you: “What makes a good team?” They want:

A concise answer (150 words max)
Focused on early-stage tech companies
Actionable (not theoretical)
Prioritized by impact

Respond as if in a 1-on-1 coaching session.


### What to Observe

Record your outputs and note:

1. **Length difference**: How many words/tokens in each?
2. **Depth**: Which response goes deeper? Why?
3. **Applicability**: Which answer is most useful for YOUR situation?
4. **Tone**: How does the persona/role affect the writing style?
5. **Structure**: How does explicit formatting guidance change the output?

Write a short reflection (100 words) comparing the three outputs. Which prompt would you use for different scenarios: for a blog post, for startup advice, for team building?

This exercise demonstrates the core principle of prompt engineering: **small changes in your instructions create meaningful changes in output**.