Error Handling, Rate Limits, and Retries
Error Handling, Rate Limits, and Retries
API calls fail. Networks timeout, servers crash, rate limits kick in, authentication tokens expire. In production, you’ll encounter all of these. This lesson teaches you how to build resilient applications that handle failures gracefully and recover automatically when possible.
Understanding HTTP Status Codes
Every API response includes a status code. You need to understand what they mean:
-
2xx (Success): Request succeeded
200 OK: Success201 Created: Resource created successfully
-
4xx (Client Error): Problem with your request (don’t retry automatically)
400 Bad Request: Invalid parameters or malformed JSON401 Unauthorized: Invalid or missing API key403 Forbidden: Authenticated but not allowed to access this404 Not Found: Endpoint doesn’t exist429 Too Many Requests: Rate limit exceeded
-
5xx (Server Error): Problem with the service (usually safe to retry)
500 Internal Server Error: Something broke on their end502 Bad Gateway: Temporary service issue503 Service Unavailable: Maintenance or overload
The crucial distinction: 4xx errors are client errors—your fault—don’t retry. 5xx errors are server errors—their fault—safe to retry.
def analyze_error_response(status_code: int) -> dict:
"""Determine retry strategy based on status code."""
return {
400: {"retryable": False, "reason": "Invalid request"},
401: {"retryable": False, "reason": "Authentication failed"},
429: {"retryable": True, "reason": "Rate limited"},
500: {"retryable": True, "reason": "Server error"},
503: {"retryable": True, "reason": "Service unavailable"}
}.get(status_code, {"retryable": False, "reason": "Unknown"})
Exponential Backoff
When a request fails with a retryable error, don’t immediately retry. That’s likely to fail again. Instead, wait—with increasing wait times. This is exponential backoff.
The pattern:
- 1st retry: wait 1 second
- 2nd retry: wait 2 seconds
- 3rd retry: wait 4 seconds
- 4th retry: wait 8 seconds
This gives the service time to recover while not waiting forever.
import time
import random
def exponential_backoff(attempt: int, base_wait: float = 1.0) -> float:
"""Calculate wait time for exponential backoff."""
# 2^attempt grows quickly: 1, 2, 4, 8, 16...
wait_time = base_wait * (2 ** attempt)
# Add jitter (randomness) to avoid thundering herd
jitter = random.uniform(0, wait_time * 0.1)
return wait_time + jitter
# Test it
for attempt in range(5):
wait = exponential_backoff(attempt)
print(f"Attempt {attempt}: wait {wait:.2f}s")
Output:
Attempt 0: wait 1.05s
Attempt 1: wait 2.08s
Attempt 2: wait 4.12s
Attempt 3: wait 8.09s
Attempt 4: wait 16.15s
Rate Limiting: Headers Matter
APIs don’t just return a 429 status code—they also include headers telling you when you can retry. Always check these:
RateLimit-Limit: Maximum requests in the windowRateLimit-Remaining: Requests left before hitting the limitRateLimit-Reset: Unix timestamp when the limit resets
import requests
from datetime import datetime
def make_request_with_rate_limit_awareness(url: str, headers: dict) -> dict:
"""Make a request and log rate limit info."""
response = requests.post(url, headers=headers, json={})
rate_limit_info = {
"limit": response.headers.get("RateLimit-Limit"),
"remaining": response.headers.get("RateLimit-Remaining"),
"reset": response.headers.get("RateLimit-Reset")
}
if rate_limit_info["remaining"]:
remaining = int(rate_limit_info["remaining"])
if remaining < 10:
print(f"⚠️ Low remaining requests: {remaining}")
if rate_limit_info["reset"]:
reset_time = int(rate_limit_info["reset"])
reset_dt = datetime.fromtimestamp(reset_time)
print(f"Rate limit resets at: {reset_dt}")
return {
"status_code": response.status_code,
"rate_limit": rate_limit_info,
"data": response.json()
}
Building a Robust Retry Wrapper
Let’s build a comprehensive retry function that handles all these cases:
from openai import OpenAI, RateLimitError, APIError, Timeout
import time
def call_with_exponential_backoff(
client: OpenAI,
max_retries: int = 3,
base_wait: float = 1.0,
**kwargs
):
"""Call an LLM with exponential backoff retry logic."""
last_exception = None
for attempt in range(max_retries):
try:
# Attempt the API call
response = client.chat.completions.create(**kwargs)
return response
except RateLimitError as e:
# Rate limit—definitely retry
if attempt < max_retries - 1:
wait_time = base_wait * (2 ** attempt)
print(f"Rate limited. Waiting {wait_time}s (attempt {attempt + 1}/{max_retries})")
time.sleep(wait_time)
last_exception = e
else:
raise
except Timeout as e:
# Timeout—safe to retry
if attempt < max_retries - 1:
print(f"Timeout. Retrying (attempt {attempt + 1}/{max_retries})")
time.sleep(base_wait * (2 ** attempt))
last_exception = e
else:
raise
except APIError as e:
# Generic API error
if hasattr(e, 'status_code') and e.status_code >= 500:
# Server error—retry
if attempt < max_retries - 1:
wait_time = base_wait * (2 ** attempt)
print(f"Server error ({e.status_code}). Retrying (attempt {attempt + 1}/{max_retries})")
time.sleep(wait_time)
last_exception = e
else:
raise
else:
# Client error—don't retry
raise
except Exception as e:
# Unexpected error—fail immediately
print(f"Unexpected error: {type(e).__name__}: {e}")
raise
# Should never reach here, but just in case
if last_exception:
raise last_exception
# Usage
client = OpenAI()
try:
response = call_with_exponential_backoff(
client,
max_retries=3,
model="gpt-4-turbo",
messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)
except Exception as e:
print(f"Failed after retries: {e}")
Circuit Breaker Pattern
For systems that make many requests, the circuit breaker pattern prevents cascading failures. If your service keeps failing, stop trying temporarily.
from enum import Enum
import time
class CircuitBreakerState(Enum):
CLOSED = "closed" # Normal operation
OPEN = "open" # Failing, reject requests
HALF_OPEN = "half_open" # Testing if service recovered
class CircuitBreaker:
"""Prevent cascading failures by stopping requests to failing services."""
def __init__(self, failure_threshold: int = 5, timeout: float = 60.0):
self.failure_threshold = failure_threshold
self.timeout = timeout
self.failure_count = 0
self.state = CircuitBreakerState.CLOSED
self.last_failure_time = None
def call(self, func, *args, **kwargs):
"""Execute function through circuit breaker."""
if self.state == CircuitBreakerState.OPEN:
# Check if timeout has passed
if time.time() - self.last_failure_time > self.timeout:
self.state = CircuitBreakerState.HALF_OPEN
print("Circuit breaker: trying to recover...")
else:
raise Exception("Circuit breaker is OPEN. Service is unavailable.")
try:
result = func(*args, **kwargs)
self.on_success()
return result
except Exception as e:
self.on_failure()
raise
def on_success(self):
"""Handle successful call."""
self.failure_count = 0
self.state = CircuitBreakerState.CLOSED
def on_failure(self):
"""Handle failed call."""
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = CircuitBreakerState.OPEN
print(f"Circuit breaker: OPENING after {self.failure_count} failures")
# Usage
breaker = CircuitBreaker(failure_threshold=3, timeout=30)
def unstable_api_call():
# Simulated unreliable function
import random
if random.random() < 0.8: # 80% chance of failure
raise Exception("API call failed")
return "Success!"
for i in range(10):
try:
result = breaker.call(unstable_api_call)
print(f"Call {i}: {result}")
except Exception as e:
print(f"Call {i}: {e}")
time.sleep(1)
Timeout Handling
Always set timeouts. Infinite hangs are dangerous:
from openai import OpenAI
client = OpenAI(timeout=30.0) # 30 second timeout
try:
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": "Hello"}]
)
except Exception as e:
print(f"Request timed out or failed: {e}")
For raw HTTP requests:
import requests
try:
response = requests.post(
"https://api.openai.com/v1/chat/completions",
json={"model": "gpt-4-turbo", "messages": []},
headers={"Authorization": f"Bearer {api_key}"},
timeout=30 # seconds
)
except requests.Timeout:
print("Request timed out after 30 seconds")
except requests.ConnectionError:
print("Connection error")
Comprehensive Error Handling Example
Here’s a production-ready pattern combining everything:
from openai import OpenAI, APIError, Timeout
import time
from typing import Optional
class RobustLLMClient:
"""LLM client with comprehensive error handling."""
def __init__(self, api_key: str, max_retries: int = 3):
self.client = OpenAI(api_key=api_key, timeout=30)
self.max_retries = max_retries
def chat_completion(
self,
prompt: str,
model: str = "gpt-4-turbo",
temperature: float = 0.7
) -> Optional[str]:
"""Get chat completion with automatic retries."""
for attempt in range(self.max_retries):
try:
response = self.client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
temperature=temperature
)
return response.choices[0].message.content
except Timeout as e:
if attempt < self.max_retries - 1:
wait = 2 ** attempt
print(f"Timeout. Waiting {wait}s before retry...")
time.sleep(wait)
else:
print(f"Timeout after {self.max_retries} retries")
return None
except APIError as e:
if e.status_code == 429: # Rate limit
wait = 10 * (2 ** attempt) # Longer wait for rate limits
print(f"Rate limited. Waiting {wait}s...")
time.sleep(wait)
elif e.status_code >= 500: # Server error
if attempt < self.max_retries - 1:
wait = 2 ** attempt
print(f"Server error ({e.status_code}). Retrying...")
time.sleep(wait)
else:
print(f"Server error persists after {self.max_retries} retries")
return None
else: # Client error (4xx except 429)
print(f"Client error: {e}")
return None
return None
# Usage
client = RobustLLMClient(api_key="sk-proj-...")
result = client.chat_completion("What is machine learning?")
if result:
print(result)
else:
print("Failed to get response")
Key Takeaway
Production systems need intelligent error handling. Distinguish between retryable errors (5xx, timeouts, rate limits) and non-retryable errors (4xx except 429). Use exponential backoff to space out retries. Monitor rate limit headers to avoid hitting limits. Consider circuit breakers for systems making many requests. Always set timeouts to prevent infinite hangs.
Exercises
-
Test status codes: Create a mock API endpoint that returns different status codes. Write error handling code that properly identifies which are retryable.
-
Implement exponential backoff: Write a function that calls an API with exponential backoff. Verify wait times increase correctly.
-
Rate limit simulation: Simulate a rate-limited API. Implement logic that respects the rate limit headers and backs off appropriately.
-
Circuit breaker pattern: Implement a circuit breaker. Verify it opens after threshold failures and half-opens to test recovery.
-
Timeout handling: Make API calls with various timeout settings. Observe behavior when requests exceed timeouts.
-
Production scenario: Write a function that handles rate limits, timeouts, and server errors simultaneously. Test with intentional failures.