Self-Service AI Capabilities

The Self-Service Vision

Imagine every team in your company can use AI without waiting for specialists. Product teams can try new models. Marketing can generate variations of copy. Support can fine-tune a model for better customer domain understanding.

Self-service AI is the ultimate scaling model. It democratizes AI capability while maintaining safety and standards through guardrails.

Self-Service vs. DIY

Self-service: Company provides tools and infrastructure; teams do the work

Teams have agency and control
Faster than waiting for specialists
Still guided by standards and governance
Examples: Prompt library, training tools, evaluation frameworks

DIY (avoid): Teams building AI with no guidance

Quality and consistency suffering
Risk of errors and bias
Knowledge not shared
Duplication of effort

Managed service: CoE builds everything for teams

Highest quality and consistency
Slowest (waiting for specialists)
Doesn’t scale
Teams have limited control

Best approach: Self-service + managed service hybrid

Self-service for straightforward problems
Managed service for complex or high-risk
Teams have choice

Self-Service Infrastructure

What must be in place for self-service to work?

1. Model and API Availability

Make models easy to access:

List of approved models and APIs (what can we use?)
Clear comparison (which is best for this use case?)
Cost for each (what will it cost?)
Examples (how do I use it?)

Example:

Approved Models for Text Processing:

Claude 3 Opus
- Cost: $15/M input, $75/M output tokens
- Accuracy: 90%+ on complex reasoning
- Good for: Analysis, deep understanding
- Example: Legal document analysis

GPT-4 Turbo
- Cost: $10/M input, $30/M output tokens
- Accuracy: 88% on structured tasks
- Good for: Classification, extraction
- Example: Email categorization

Claude 3 Haiku
- Cost: $0.25/M input, $1.25/M output tokens
- Accuracy: 85% on straightforward tasks
- Good for: Simple classification, summaries
- Example: Topic tagging

2. Prompt Library and Templates

Make it easy to start:

Library of prompts for common tasks
Copy-paste starting point
Documentation of what works

Example prompt library:

Prompt: Email classification
Purpose: Categorize emails into support categories
Model: Claude 3 Haiku
Category: Support operations

[System prompt]:
You are a helpful email classifier. Analyze the customer email and
categorize it as one of: Billing, Technical, Account, Refund, Other

[User prompt]:
Categorize this email:
{email_text}

Expected output: Category name + confidence (high/medium/low)

Example:
Input: "My invoice is wrong, charged $500 instead of $50"
Output: Billing, high confidence

Tips:
- Subject line often indicates category
- First few sentences usually most relevant
- Can handle common typos and variations

3. Evaluation and Testing Tools

Make quality assessment easy:

Accuracy evaluation framework (does it work?)
Fairness evaluation (is it biased?)
Cost tracking (how much did it cost?)
Comparison tool (which approach is better?)

Example evaluation tool:

Model Evaluation Framework

Step 1: Create test set
- Collect 50-100 examples where you know the right answer
- Save as CSV (input, expected_output)
- Upload to platform

Step 2: Run evaluation
- Select model to test
- Platform runs model on all examples
- Calculates accuracy, errors

Step 3: Review results
- Accuracy score (percentage correct)
- Error analysis (common mistakes?)
- Fairness check (same accuracy across groups?)

Step 4: Compare approaches
- Run evaluation on different models
- See accuracy vs. cost tradeoff
- Pick best approach

Output: Report showing accuracy, cost, time to result

4. Documentation and Training

Make it self-discoverable:

“Getting started” guide (15 minutes to first result)
API documentation (how to call it)
Common patterns (copy-paste examples)
Troubleshooting (common problems and fixes)
Office hours (when to ask for help)

5. Monitoring and Alerts

Make it safe:

Monitor accuracy (alert if drops >5%)
Monitor cost (alert if high spending)
Monitor failures (alert on errors)
Dashboard (see how you’re doing)

Example alert:

Alert: Model accuracy dropped
Current: 78% (vs. target 85%)
What happened: New data type (different emails) showing lower accuracy
Recommended action: Collect examples of this type, retrain
Help: Click to see new error patterns, or contact AI team

Governance for Self-Service

Self-service needs guardrails. Too strict and teams bypass it. Too loose and bad things happen.

Risk-Based Controls

Low-risk (minimal controls):

Using existing, proven models on current data
Internal use only (no external impact)
Standard tasks (classification, summarization)
Requires: Just compliance checklist

Medium-risk (moderate controls):

Customer-facing features
New models or approaches
Decision-making (more than just suggestion)
Requires: Fairness evaluation, human review, approval

High-risk (strict controls):

High-stakes decisions (hiring, lending, medical)
New data source
Autonomous decisions
Requires: Full governance review, external audit, legal

Self-Service Approval Process

For low-risk:

Team fills out 1-page form
Automated validation checks
Auto-approved if passes
Can ship immediately

For medium-risk:

Team fills out 3-page form (problem, data, risks)
Automated validation checks
Fairness evaluation (automated if possible)
Review by CoE (2-3 days)
Approval or feedback

For high-risk:

Full governance board review
Risk assessment
Regulatory/legal review
1-2 week process

Guardrails in Technology

What the platform prevents:

Using untrained models (must be evaluated first)
Using without monitoring (must track accuracy)
Deploying low-confidence models (accuracy too low)
Using without approval (governance process enforced)

Example guardrail:

Check before deploying:
1. Model accuracy ≥ target? YES
2. Fairness evaluation passed? YES
3. Cost per prediction < budget? YES
4. Approval obtained? YES (Low-risk auto-approved)
5. Monitoring configured? YES (Alerts set)

Result: ✓ READY TO DEPLOY

If any checks fail:
Result: ✗ BLOCKED - Fix these issues before deploying
- Accuracy is 78%; need ≥85%
- Contact AI team for help improving

Self-Service Models and APIs

What should teams have access to?

Always Available (No Approval)

Major LLMs with strong track records (OpenAI, Anthropic, Google)
Embedding models
Standard classification/summarization approaches
Internal tools and frameworks

Rationale: Low risk, proven, been used successfully

Available with Approval

Fine-tuned models on company data
Custom models for specific use case
Real-time decision systems
Customer-facing automated decisions

Rationale: Higher risk, need to verify before use

Case-by-Case

Experimental models
Models with fairness concerns
Medical/safety-critical models
Government decision-making

Rationale: High risk, need careful review

Self-Service Examples

Example 1: Prompt Library for Support

Support AI Prompt Library

✓ Email Classification
- Who can use: Support team (self-service)
- Model: Claude 3 Haiku
- Purpose: Route incoming emails
- Approval: Auto-approved

✓ Response Templates
- Who can use: Support team
- Model: Claude 3 Opus
- Purpose: Draft responses to common questions
- Approval: Auto-approved

✓ Sentiment Analysis
- Who can use: Support team with approval
- Model: Claude 3 Haiku
- Purpose: Detect customer sentiment
- Approval: CoE review (1 day)

✗ Automated Response
- Who can use: Not self-service
- Model: Claude 3 Opus
- Purpose: Send responses automatically to customers
- Approval: Full governance board review

Example 2: Data Science Workbench

Self-Service Platform: Data Science Workbench

Available without approval:
- Query company data (with privacy controls)
- Run analyses and build models
- Create visualizations
- Share results with team

Available with approval:
- Share externally (requires approval for confidentiality)
- Use in production decision-making
- Deploy as automated system

Process:
1. Data scientist works in sandbox (no approval needed)
2. Achieves results they want
3. Submits for production deployment (approval required)
4. CoE evaluates for fairness, documentation, monitoring
5. Approved or feedback for improvement

Preventing Misuse

Self-service has risks. Prevent misuse through design.

Common Misuses

Prohibited use 1: Discrimination

Using protected attributes directly (race, gender, etc.)
Using proxies that correlate with protected attributes
Prevention: Audit for this, flag if found

Prohibited use 2: Privacy violation

Using sensitive data beyond intended scope
Not anonymizing personal information
Prevention: Data governance controls, audit logs

Prohibited use 3: Security breach

Exposing model outputs containing sensitive data
Logging confidential information
Prevention: Data classification, log filtering

Prohibited use 4: Inaccuracy without care

Deploying model without testing
Ignoring accuracy below threshold
Prevention: Platform prevents deployment without evaluation

Detection and Response

If misuse detected:

Immediate action: Disable system if high risk
Investigation: What happened? How long?
Remediation: Fix the issue
Communication: Affected parties notified
Prevention: Policy or control change

Measuring Self-Service Success

Adoption Metrics

Percentage of technical teams using platform
Monthly users
Monthly models deployed
Growth rate

Quality Metrics

Average model accuracy
Fairness audit pass rate
SLA compliance (uptime, performance)
User satisfaction

Business Metrics

Cost per prediction
Time-to-launch new feature
Revenue impact
Cost savings

Example Success Targets (Year 1)

Adoption:
- 50% of product teams using platform
- 30+ models deployed
- 100M predictions/month
- User satisfaction: 4/5

Quality:
- Average accuracy: 85%+
- 95% pass fairness audit
- 99% uptime
- <1% of deployments need rollback

Business:
- Cost per prediction: $0.000002
- Average time-to-launch: 3 weeks
- Estimated ROI: 2.5x
- 5 new AI-powered features shipped

Strategic Questions

What level of self-service makes sense for you? (Constraints?)
What guardrails are necessary? (Too many kill adoption)
What would good support look like? (Office hours? Docs?)
How will you prevent misuse? (Technology or culture?)
When will self-service be ready? (Phase plan)

Key Takeaway: Self-service AI democratizes capability but requires infrastructure, tools, documentation, and guardrails. Risk-based controls let teams move fast while maintaining standards. Platform prevents bad outcomes while enabling good ones. Success requires excellent documentation, responsive support, and clear governance.

Discussion Prompt

What would self-service AI look like in your organization? What would teams be able to do? What would still require specialists?