Making Your First API Calls

Now that you understand how LLM APIs work conceptually, it’s time to write actual code. In this lesson, you’ll create working Python functions that call real models, handle responses, work with streaming, and parse structured output. By the end, you’ll have production-ready patterns you can use immediately.

Setting Up Your Environment

Before you call any API, you need the right tools installed:

pip install openai anthropic python-dotenv requests

Create a .env file in your project root:

OPENAI_API_KEY=sk-proj-...
ANTHROPIC_API_KEY=sk-ant-...

And a requirements.txt:

openai>=1.0.0
anthropic>=0.7.0
python-dotenv>=1.0.0
requests>=2.31.0

Load your keys safely:

import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY")

if not OPENAI_API_KEY or not ANTHROPIC_API_KEY:
    raise ValueError("API keys not found in environment variables")

Your First OpenAI API Call

The OpenAI SDK makes this straightforward. Here’s the simplest possible working example:

from openai import OpenAI

client = OpenAI(api_key=OPENAI_API_KEY)

response = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.choices[0].message.content)

That’s it. You get back an object with attributes you can access directly. Let’s expand this to see what’s available:

def simple_openai_call(prompt: str) -> dict:
    """Make a simple call to OpenAI and return detailed info."""
    client = OpenAI(api_key=OPENAI_API_KEY)

    response = client.chat.completions.create(
        model="gpt-4-turbo",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.7,
        max_tokens=500
    )

    return {
        "content": response.choices[0].message.content,
        "finish_reason": response.choices[0].finish_reason,
        "prompt_tokens": response.usage.prompt_tokens,
        "completion_tokens": response.usage.completion_tokens,
        "total_tokens": response.usage.total_tokens,
        "model": response.model,
        "id": response.id
    }

# Usage
result = simple_openai_call("Explain photosynthesis in 100 words")
print(f"Response: {result['content']}")
print(f"Tokens used: {result['total_tokens']}")

Your First Anthropic API Call

Anthropic’s SDK is similarly clean:

from anthropic import Anthropic

client = Anthropic(api_key=ANTHROPIC_API_KEY)

response = client.messages.create(
    model="claude-3-opus-20240229",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.content[0].text)

Notice the differences: messages.create instead of chat.completions.create, and max_tokens instead of max_tokens in the same place (okay, that’s the same—but the model names differ, and Anthropic requires max_tokens).

Here’s a comparable wrapper:

def simple_anthropic_call(prompt: str) -> dict:
    """Make a simple call to Anthropic and return detailed info."""
    client = Anthropic(api_key=ANTHROPIC_API_KEY)

    response = client.messages.create(
        model="claude-3-opus-20240229",
        max_tokens=1000,
        messages=[{"role": "user", "content": prompt}]
    )

    return {
        "content": response.content[0].text,
        "stop_reason": response.stop_reason,
        "input_tokens": response.usage.input_tokens,
        "output_tokens": response.usage.output_tokens,
        "model": response.model,
        "id": response.id
    }

result = simple_anthropic_call("Explain photosynthesis in 100 words")
print(f"Response: {result['content']}")
print(f"Total tokens: {result['input_tokens'] + result['output_tokens']}")

Building a Conversation

Most applications need conversations, not just one-off requests. Both SDKs work with message arrays that you build up over time:

def conversation_with_openai():
    """Maintain a multi-turn conversation."""
    client = OpenAI(api_key=OPENAI_API_KEY)

    messages = [
        {
            "role": "system",
            "content": "You are a helpful assistant specializing in science."
        }
    ]

    user_input = "What is photosynthesis?"
    messages.append({"role": "user", "content": user_input})

    response = client.chat.completions.create(
        model="gpt-4-turbo",
        messages=messages,
        temperature=0.7
    )

    assistant_response = response.choices[0].message.content
    messages.append({"role": "assistant", "content": assistant_response})

    print(f"Assistant: {assistant_response}")

    # Second turn
    user_input_2 = "How does it produce oxygen?"
    messages.append({"role": "user", "content": user_input_2})

    response_2 = client.chat.completions.create(
        model="gpt-4-turbo",
        messages=messages,
        temperature=0.7
    )

    assistant_response_2 = response_2.choices[0].message.content
    messages.append({"role": "assistant", "content": assistant_response_2})

    print(f"Assistant: {assistant_response_2}")

    return messages

# Run conversation
conversation_with_openai()

The key insight: you maintain a messages list and keep appending to it. Each call includes the entire history, so the model has context.

Streaming Responses

For real-time applications, streaming is essential. Here’s how to implement it:

def streaming_openai_call(prompt: str):
    """Stream a response from OpenAI and print in real-time."""
    client = OpenAI(api_key=OPENAI_API_KEY)

    with client.chat.completions.create(
        model="gpt-4-turbo",
        messages=[{"role": "user", "content": prompt}],
        stream=True  # Enable streaming
    ) as stream:
        full_response = ""
        for text in stream.text_stream:  # Iterate over text chunks
            print(text, end="", flush=True)  # Print immediately
            full_response += text

    print()  # Newline at end
    return full_response

# Usage
result = streaming_openai_call("Write a haiku about machine learning")
print(f"\nFull response: {result}")

For Anthropic:

def streaming_anthropic_call(prompt: str):
    """Stream a response from Anthropic and print in real-time."""
    client = Anthropic(api_key=ANTHROPIC_API_KEY)

    full_response = ""
    with client.messages.stream(
        model="claude-3-opus-20240229",
        max_tokens=500,
        messages=[{"role": "user", "content": prompt}]
    ) as stream:
        for text in stream.text_stream:
            print(text, end="", flush=True)
            full_response += text

    print()
    return full_response

# Usage
result = streaming_anthropic_call("Write a haiku about AI")

Both SDKs use context managers (with statements) to handle streaming cleanly.

Parsing Structured Output

Often you need the response in a specific format—JSON, a specific object structure, etc. Modern models support this with response_format:

import json

def extract_json_response(topic: str) -> dict:
    """Request structured JSON output from the model."""
    client = OpenAI(api_key=OPENAI_API_KEY)

    response = client.chat.completions.create(
        model="gpt-4-turbo",
        messages=[
            {
                "role": "user",
                "content": f"""Extract information about {topic} in JSON format.
                Return: {{"name": "", "description": "", "year_discovered": 0}}"""
            }
        ],
        response_format={"type": "json_object"}  # Forces JSON output
    )

    content = response.choices[0].message.content
    try:
        data = json.loads(content)
        return data
    except json.JSONDecodeError as e:
        print(f"Failed to parse JSON: {e}")
        print(f"Raw content: {content}")
        return {}

# Usage
result = extract_json_response("Penicillin")
print(json.dumps(result, indent=2))

For Anthropic, you can use Claude’s ability to follow format instructions:

def extract_structured_data(topic: str) -> dict:
    """Use Anthropic to extract structured data."""
    client = Anthropic(api_key=ANTHROPIC_API_KEY)

    response = client.messages.create(
        model="claude-3-opus-20240229",
        max_tokens=500,
        messages=[
            {
                "role": "user",
                "content": f"""Extract information about {topic}.
                Return ONLY valid JSON in this format:
                {{"name": "", "category": "", "interesting_fact": ""}}"""
            }
        ]
    )

    content = response.content[0].text
    try:
        # Claude often wraps JSON in markdown, so handle that
        if content.startswith("```json"):
            content = content[7:-3]  # Remove ```json and ```
        data = json.loads(content)
        return data
    except json.JSONDecodeError as e:
        print(f"Failed to parse: {e}")
        return {}

result = extract_structured_data("Photosynthesis")
print(json.dumps(result, indent=2))

Error Handling Best Practices

Real applications need robust error handling. Here’s a pattern:

from openai import RateLimitError, APIError, APIConnectionError
import time

def robust_api_call_with_retry(prompt: str, max_retries: int = 3):
    """Make an API call with basic retry logic."""
    client = OpenAI(api_key=OPENAI_API_KEY)

    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="gpt-4-turbo",
                messages=[{"role": "user", "content": prompt}],
                timeout=30  # 30 second timeout
            )
            return response.choices[0].message.content

        except RateLimitError as e:
            if attempt < max_retries - 1:
                wait_time = 2 ** attempt  # Exponential backoff
                print(f"Rate limited. Waiting {wait_time}s before retry...")
                time.sleep(wait_time)
            else:
                raise

        except APIConnectionError as e:
            if attempt < max_retries - 1:
                print(f"Connection error. Retrying...")
                time.sleep(1)
            else:
                raise

        except APIError as e:
            if e.status_code == 500:  # Server error—retry
                if attempt < max_retries - 1:
                    print(f"Server error. Retrying...")
                    time.sleep(1)
                else:
                    raise
            else:  # Client error—don't retry
                raise

# Usage
try:
    result = robust_api_call_with_retry("What is AI?")
    print(result)
except Exception as e:
    print(f"Failed after all retries: {e}")

Key Takeaway

Start with the official SDKs—they handle authentication, serialization, and error handling for you. Build up conversations by maintaining a messages list. Use streaming for better user experience. Leverage response_format and structured prompts for predictable output. Always implement retry logic for production applications.

Exercises

Make your first calls: Using both OpenAI and Anthropic SDKs, make simple API calls and print the responses, token counts, and IDs.
Build a multi-turn conversation: Create a function that holds a 3-turn conversation with the model, maintaining context between turns.
Implement streaming: Write a function that streams responses and compares latency with non-streaming calls.
Extract structured data: Request JSON output from an API call. Parse it and verify it matches your expected schema.
Add error handling: Wrap an API call with try/except blocks. Test behavior when you intentionally pass invalid parameters.
Compare costs: Make identical requests to OpenAI and Anthropic. Log token counts. Calculate the cost difference (you’ll need to look up current pricing).