Advanced
MLOps and Model Management
MLOps and Model Management
Production ML systems require experiment tracking, model versioning, continuous integration, and monitoring. This lesson covers Weights & Biases, MLflow, model registries, and CI/CD pipelines for ML.
Core Concepts
Experiment Tracking
Log hyperparameters, metrics, artifacts for reproducibility:
Experiment 1: lr=2e-5, batch_size=32 → val_loss=2.15, accuracy=0.78
Experiment 2: lr=1e-5, batch_size=32 → val_loss=2.08, accuracy=0.79
Model Versioning
Track model versions with metadata:
Model v1.0: gpt2-base, val_loss=2.15
Model v1.1: gpt2-base + LoRA, val_loss=2.08
Model v2.0: gpt2-medium, val_loss=1.95
Model Registry
Central repository for production models:
Registry:
- Model A v1.0 (dev)
- Model A v1.1 (staging)
- Model A v1.0 (production)
- Model B v2.0 (canary, 10% traffic)
CI/CD for ML
Automate testing, evaluation, and deployment:
Push code → Run tests → Train model → Evaluate → Deploy if passing
Practical Implementation
Weights & Biases Tracking
import wandb
from transformers import TrainingArguments, Trainer
# Initialize project
wandb.init(project="llm-training", entity="my-org")
# Log config
wandb.config.update({
"model": "gpt2",
"learning_rate": 2e-5,
"batch_size": 32,
})
# Training arguments with W&B integration
training_args = TrainingArguments(
output_dir="./results",
report_to=["wandb"],
logging_dir="./logs",
num_train_epochs=3,
)
# Trainer automatically logs
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
)
trainer.train()
# Log final metrics
wandb.log({
"final_accuracy": 0.95,
"final_loss": 0.15,
})
wandb.finish()
MLflow Experiment Management
import mlflow
from mlflow.models import infer_signature
# Start experiment
with mlflow.start_run(run_name="gpt2-training"):
# Log params
mlflow.log_param("learning_rate", 2e-5)
mlflow.log_param("batch_size", 32)
# Train model...
# Log metrics
mlflow.log_metric("train_loss", 2.1)
mlflow.log_metric("val_accuracy", 0.95)
# Log model
mlflow.pytorch.log_model(model, "model")
# Create model signature
signature = infer_signature(X_train, model.predict(X_train))
mlflow.models.log_model(model, "model", signature=signature)
Model Registry
# Register model
result = mlflow.register_model("runs:/abc123/model", "GPT2-classifier")
# Transition to production
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
name="GPT2-classifier",
version=result.version,
stage="Production",
)
# Load production model
prod_model = mlflow.pyfunc.load_model("models:/GPT2-classifier/Production")
CI/CD Pipeline
# .github/workflows/ml-pipeline.yml
name: ML Pipeline
on: [push]
jobs:
train-and-evaluate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install dependencies
run: pip install -r requirements.txt
- name: Run tests
run: pytest tests/
- name: Train model
run: python train.py
- name: Evaluate model
run: python evaluate.py
- name: Deploy if tests pass
if: success()
run: python deploy.py
- name: Log to MLflow
run: python log_metrics.py
Advanced Techniques
A/B Testing Infrastructure
class ABTestManager:
def __init__(self, registry):
self.registry = registry
self.traffic_split = {"model_a": 0.9, "model_b": 0.1}
def route_request(self, request):
model_choice = np.random.choice(
list(self.traffic_split.keys()),
p=list(self.traffic_split.values())
)
model = self.registry.load(model_choice)
output = model.predict(request)
return output, {"model": model_choice}
Monitoring and Alerts
import prometheus_client
class ModelMonitor:
def __init__(self):
self.request_count = prometheus_client.Counter("requests", "Total requests")
self.latency = prometheus_client.Histogram("latency_seconds", "Request latency")
self.accuracy = prometheus_client.Gauge("accuracy", "Model accuracy")
def predict(self, input_data):
with self.latency.time():
output = self.model.predict(input_data)
self.request_count.inc()
return output
def update_accuracy(self, metrics):
self.accuracy.set(metrics["accuracy"])
Production Considerations
Model Serving Architecture
Client requests → Load Balancer → Model Serving Container
├─ Model A (80%)
└─ Model B (20% canary)
All metrics → Monitoring System (Prometheus)
→ Alerting System (PagerDuty)
→ Logging System (ELK Stack)
Key Takeaway
MLOps infrastructure tracks experiments, versions models, and automates deployment. Implement early with W&B or MLflow to enable reproducibility, collaboration, and safe production rollouts.
Practical Exercise
Task: Set up complete MLOps pipeline with experiment tracking and CI/CD.
Requirements:
- Initialize W&B or MLflow project
- Log experiments with metrics
- Register best model to registry
- Create GitHub Actions CI/CD
- Set up monitoring and alerting
Evaluation:
- Experiment reproducibility
- Successful CI/CD execution
- Model versioning and tracking
- Alert system validation
- Team collaboration workflow