From Pilot to Platform

The Pilot Purgatory Problem (Redux)

Most organizations build AI platforms wrong. They run successful pilots (one feature, one team), then struggle to scale because they didn’t build for scale.

A pilot for one team is different from a platform serving many teams.

Pilot characteristics:

Single use case
Small team
Custom solutions OK
Quick and dirty acceptable
Unique data

Platform characteristics:

Multiple use cases
Many teams
Standardized approach needed
Scalable required
Shared infrastructure

You can’t turn a pilot into a platform without major redesign. Better to build for platform from the start.

Platform Thinking

Shift from “let’s solve this specific problem” to “let’s build infrastructure others can use.”

Platform Components

1. Shared Infrastructure

What teams would use repeatedly:

Model serving (host models, manage versions)
Data access (query company data securely)
Feature store (pre-computed features)
Monitoring (track accuracy, latency, cost)
Evaluation tools (assess model performance)

2. Standardized Interfaces

How teams interact with platform:

API for predictions (teams call API, get result)
Training pipeline (teams submit data, get trained model)
Monitoring dashboard (see how model is performing)
Deployment workflow (test → stage → prod)

3. Governance and Controls

How organization maintains standards:

Approval workflow (before deploying to production)
Monitoring requirements (what must be tracked)
Fairness testing (bias checks required)
Documentation standards (what must be documented)

4. Services and Support

How teams get help:

Training (teach teams to use platform)
Consulting (help with problems)
Office hours (questions answered)
Documentation (guides and examples)

Platform vs. Point Solution: When to Invest

Build a Platform If:

You have 3+ similar AI use cases
You’ll likely have 5-10+ uses over time
Different teams want to use AI
Standardization provides value
Cost saving from shared infrastructure is significant

Example: Financial services company building AI for risk assessment, fraud detection, customer churn, and recommendations. Common needs:

Data access to customer info
Model evaluation and testing
Production monitoring
Fairness evaluation

Platform serves all 4; saves duplicated infrastructure.

Optimize Point Solutions If:

Only 1-2 specific use cases
Each use case is unique
Different teams have different needs
Time-to-value is critical
Can optimize later

Example: Healthcare startup building diagnostic AI. Each diagnosis is unique, custom model for each. Premature to build platform; build this one really well, then expand.

Building a Platform: Phased Approach

Phase 1: Foundation (Months 1-4)

Goal: Core infrastructure others can build on

Components:

Model serving infrastructure (host models)
API for predictions
Basic monitoring
Version management

Scope: “Any team can deploy a model and get predictions through our API”

Team: 2-3 engineers building infrastructure Cost: $150-300K Output: Working platform for v1 use case

Phase 2: Standardization (Months 4-8)

Goal: Enforce quality standards

Components:

Fairness evaluation framework
Monitoring dashboards (accuracy, fairness, cost)
Approval workflow
Documentation templates

Scope: “Every model goes through evaluation and approval before production”

Team: Add 1 data scientist for evaluation; 1 engineer for governance Cost: $300-500K Output: Governance infrastructure

Phase 3: Developer Experience (Months 8-12)

Goal: Make platform easy to use

Components:

Training and documentation
Starter templates (copy-paste models to start)
CLI tools (command line for common tasks)
Auto-scaling (handle variable load)

Scope: “New team can get started in 1 day”

Team: Add 1 PM + 1 engineer for DX Cost: $400-600K Output: Usable platform

Phase 4: Optimization (Months 12+)

Goal: Improve over time

Components:

Cost optimization
Performance tuning
New features based on user feedback
Expand to new model types

Team: Ongoing (2-3 people) Cost: $200K+/year Output: Mature platform

Common Platform Mistakes

Mistake 1: Over-Engineering

Build for scale you don’t have.

What happens: Spend 6 months building perfect platform; never used because it’s too complex Fix: Start small (just what you need), iterate

Mistake 2: No User Focus

Build what engineers think users need, not what they actually need.

What happens: Teams don’t use platform because it doesn’t fit their workflow Fix: Talk to teams first; build what they’ll actually use

Mistake 3: Setting It and Forgetting It

Build platform, then move on; don’t maintain it.

What happens: Platform breaks, teams go back to manual approaches Fix: Allocate ongoing ownership and support

Mistake 4: Not Enough Standardization

Let each team do their own thing.

What happens: No economies of scale; infrastructure costs same as point solutions Fix: Require standards; enforce through governance

Mistake 5: Insufficient Documentation

Engineers understand how to use it; nobody else does.

What happens: Teams try but can’t figure it out; give up Fix: Prioritize documentation and training early

Example Platform Architecture

Team A (Recommendations)

Data: User behavior
Model: Collaborative filtering
Output: Top 5 recommendations

Team B (Fraud Detection)

Data: Transaction data
Model: Classification (fraud vs. legitimate)
Output: Fraud score

Shared Platform

┌─────────────────────────────┐
│  Data Access Layer          │
│  (Query company data)       │
└─────────────────────────────┘
         ↓
┌─────────────────────────────┐
│  Model Training Pipeline    │
│  (Train models)             │
└─────────────────────────────┘
         ↓
┌─────────────────────────────┐
│  Model Evaluation            │
│  (Test accuracy, fairness)  │
└─────────────────────────────┘
         ↓
┌─────────────────────────────┐
│  Model Registry             │
│  (Version management)       │
└─────────────────────────────┘
         ↓
┌─────────────────────────────┐
│  Model Serving              │
│  (Serve predictions via API)│
└─────────────────────────────┘
         ↓
┌─────────────────────────────┐
│  Monitoring & Observability │
│  (Track performance)        │
└─────────────────────────────┘

Teams use these components rather than building their own.

Governance for Platform Users

Approval Process

For new models:

Team develops model
Submits for approval (via CLI/web form)
Automated checks (code quality, documentation)
Human review (fairness, business case)
Approval → Deploy to staging
Production deployment (monitored)

Self-serve for low-risk (retrain on same data, same model type) Human review for new models (different approach, new data)

Service Level Agreement (SLA)

For platform:

API availability: 99.9%
Prediction latency: <500ms
Data freshness: Hourly

For monitoring:

Alert on accuracy drop >2%
Alert on latency >1 second
Alert on errors >5%

Cost Allocation

Pricing model:

Free tier: First 100K predictions/month
Standard: $10 per 1M predictions
Enterprise: Volume discount

This encourages efficient use; teams optimize prompts/models.

Supporting Team Success

Training and Onboarding

New team joining platform:

4-hour onboarding workshop (platform overview, hands-on)
1-on-1 with platform team (help getting started)
Office hours (questions answered)
Self-serve docs (reference material)

Outcome: Team ready to build in 2-3 days

Platform Documentation

Required:

“Getting Started” guide (5 pages, very concrete)
API reference (what endpoints exist, parameters)
Examples (copy-paste code you can run)
Troubleshooting (common issues, fixes)
Best practices (how to do things well)

Not required:

Deep dives into how platform works (nice-to-have)
Theory of machine learning (use external resources)

Community

To prevent isolation and share learning:

Monthly user meetup (30 min, share what teams are doing)
Shared Slack channel (ask questions, share tips)
Rotating lunch-and-learns (team shares their use case)
Regular office hours (platform team available)

Measuring Platform Success

Adoption Metrics

Number of teams using platform
Number of models deployed
Monthly predictions served
Growth rate (month-over-month)

Quality Metrics

Average model accuracy
Deployment frequency
Incident response time
User satisfaction

Business Metrics

Cost per prediction (tracking down over time)
ROI per use case
Revenue enabled by platform
Time to launch new model

Example Dashboard

Platform Health (Month 12)
├─ Adoption
│  ├─ Teams using: 12
│  ├─ Models deployed: 24
│  ├─ Predictions/month: 500M
│  └─ Growth: +50% YoY
├─ Quality
│  ├─ Average accuracy: 87%
│  ├─ Availability: 99.92%
│  ├─ Incident response: 15 min avg
│  └─ User satisfaction: 4.3/5
└─ Business
   ├─ Cost/prediction: $0.000001
   ├─ Estimated ROI: 4.2x
   └─ New use cases in pipeline: 8

Strategic Questions

Should you build a platform or point solutions? What’s your timeline?
What should be shared infrastructure? (Data access? Model serving? Both?)
What governance will you impose? (Too much kills adoption)
Who will maintain the platform long-term? (Ongoing investment required)
How will you measure success? (Adoption? Cost reduction? ROI?)

Key Takeaway: Platform thinking scales AI across the organization. Build shared infrastructure, enforce standards, support users. Start small (core components), expand over time. Avoid over-engineering and under-maintaining. Measure adoption, quality, and business impact.

Discussion Prompt

Should your organization build a platform or optimize point solutions? What would the core components be?