Advanced

Self-Service AI Capabilities

Lesson 3 of 4 Estimated Time 50 min

Self-Service AI Capabilities

The Self-Service Vision

Imagine every team in your company can use AI without waiting for specialists. Product teams can try new models. Marketing can generate variations of copy. Support can fine-tune a model for better customer domain understanding.

Self-service AI is the ultimate scaling model. It democratizes AI capability while maintaining safety and standards through guardrails.

Self-Service vs. DIY

Self-service: Company provides tools and infrastructure; teams do the work

  • Teams have agency and control
  • Faster than waiting for specialists
  • Still guided by standards and governance
  • Examples: Prompt library, training tools, evaluation frameworks

DIY (avoid): Teams building AI with no guidance

  • Quality and consistency suffering
  • Risk of errors and bias
  • Knowledge not shared
  • Duplication of effort

Managed service: CoE builds everything for teams

  • Highest quality and consistency
  • Slowest (waiting for specialists)
  • Doesn’t scale
  • Teams have limited control

Best approach: Self-service + managed service hybrid

  • Self-service for straightforward problems
  • Managed service for complex or high-risk
  • Teams have choice

Self-Service Infrastructure

What must be in place for self-service to work?

1. Model and API Availability

Make models easy to access:

  • List of approved models and APIs (what can we use?)
  • Clear comparison (which is best for this use case?)
  • Cost for each (what will it cost?)
  • Examples (how do I use it?)

Example:

Approved Models for Text Processing:

Claude 3 Opus
- Cost: $15/M input, $75/M output tokens
- Accuracy: 90%+ on complex reasoning
- Good for: Analysis, deep understanding
- Example: Legal document analysis

GPT-4 Turbo
- Cost: $10/M input, $30/M output tokens
- Accuracy: 88% on structured tasks
- Good for: Classification, extraction
- Example: Email categorization

Claude 3 Haiku
- Cost: $0.25/M input, $1.25/M output tokens
- Accuracy: 85% on straightforward tasks
- Good for: Simple classification, summaries
- Example: Topic tagging

2. Prompt Library and Templates

Make it easy to start:

  • Library of prompts for common tasks
  • Copy-paste starting point
  • Documentation of what works

Example prompt library:

Prompt: Email classification
Purpose: Categorize emails into support categories
Model: Claude 3 Haiku
Category: Support operations

[System prompt]:
You are a helpful email classifier. Analyze the customer email and
categorize it as one of: Billing, Technical, Account, Refund, Other

[User prompt]:
Categorize this email:
{email_text}

Expected output: Category name + confidence (high/medium/low)

Example:
Input: "My invoice is wrong, charged $500 instead of $50"
Output: Billing, high confidence

Tips:
- Subject line often indicates category
- First few sentences usually most relevant
- Can handle common typos and variations

3. Evaluation and Testing Tools

Make quality assessment easy:

  • Accuracy evaluation framework (does it work?)
  • Fairness evaluation (is it biased?)
  • Cost tracking (how much did it cost?)
  • Comparison tool (which approach is better?)

Example evaluation tool:

Model Evaluation Framework

Step 1: Create test set
- Collect 50-100 examples where you know the right answer
- Save as CSV (input, expected_output)
- Upload to platform

Step 2: Run evaluation
- Select model to test
- Platform runs model on all examples
- Calculates accuracy, errors

Step 3: Review results
- Accuracy score (percentage correct)
- Error analysis (common mistakes?)
- Fairness check (same accuracy across groups?)

Step 4: Compare approaches
- Run evaluation on different models
- See accuracy vs. cost tradeoff
- Pick best approach

Output: Report showing accuracy, cost, time to result

4. Documentation and Training

Make it self-discoverable:

  • “Getting started” guide (15 minutes to first result)
  • API documentation (how to call it)
  • Common patterns (copy-paste examples)
  • Troubleshooting (common problems and fixes)
  • Office hours (when to ask for help)

5. Monitoring and Alerts

Make it safe:

  • Monitor accuracy (alert if drops >5%)
  • Monitor cost (alert if high spending)
  • Monitor failures (alert on errors)
  • Dashboard (see how you’re doing)

Example alert:

Alert: Model accuracy dropped
Current: 78% (vs. target 85%)
What happened: New data type (different emails) showing lower accuracy
Recommended action: Collect examples of this type, retrain
Help: Click to see new error patterns, or contact AI team

Governance for Self-Service

Self-service needs guardrails. Too strict and teams bypass it. Too loose and bad things happen.

Risk-Based Controls

Low-risk (minimal controls):

  • Using existing, proven models on current data
  • Internal use only (no external impact)
  • Standard tasks (classification, summarization)
  • Requires: Just compliance checklist

Medium-risk (moderate controls):

  • Customer-facing features
  • New models or approaches
  • Decision-making (more than just suggestion)
  • Requires: Fairness evaluation, human review, approval

High-risk (strict controls):

  • High-stakes decisions (hiring, lending, medical)
  • New data source
  • Autonomous decisions
  • Requires: Full governance review, external audit, legal

Self-Service Approval Process

For low-risk:

  1. Team fills out 1-page form
  2. Automated validation checks
  3. Auto-approved if passes
  4. Can ship immediately

For medium-risk:

  1. Team fills out 3-page form (problem, data, risks)
  2. Automated validation checks
  3. Fairness evaluation (automated if possible)
  4. Review by CoE (2-3 days)
  5. Approval or feedback

For high-risk:

  1. Full governance board review
  2. Risk assessment
  3. Regulatory/legal review
  4. 1-2 week process

Guardrails in Technology

What the platform prevents:

  • Using untrained models (must be evaluated first)
  • Using without monitoring (must track accuracy)
  • Deploying low-confidence models (accuracy too low)
  • Using without approval (governance process enforced)

Example guardrail:

Check before deploying:
1. Model accuracy ≥ target? YES
2. Fairness evaluation passed? YES
3. Cost per prediction < budget? YES
4. Approval obtained? YES (Low-risk auto-approved)
5. Monitoring configured? YES (Alerts set)

Result: ✓ READY TO DEPLOY

If any checks fail:
Result: ✗ BLOCKED - Fix these issues before deploying
- Accuracy is 78%; need ≥85%
- Contact AI team for help improving

Self-Service Models and APIs

What should teams have access to?

Always Available (No Approval)

  • Major LLMs with strong track records (OpenAI, Anthropic, Google)
  • Embedding models
  • Standard classification/summarization approaches
  • Internal tools and frameworks

Rationale: Low risk, proven, been used successfully

Available with Approval

  • Fine-tuned models on company data
  • Custom models for specific use case
  • Real-time decision systems
  • Customer-facing automated decisions

Rationale: Higher risk, need to verify before use

Case-by-Case

  • Experimental models
  • Models with fairness concerns
  • Medical/safety-critical models
  • Government decision-making

Rationale: High risk, need careful review

Self-Service Examples

Example 1: Prompt Library for Support

Support AI Prompt Library

✓ Email Classification
- Who can use: Support team (self-service)
- Model: Claude 3 Haiku
- Purpose: Route incoming emails
- Approval: Auto-approved

✓ Response Templates
- Who can use: Support team
- Model: Claude 3 Opus
- Purpose: Draft responses to common questions
- Approval: Auto-approved

✓ Sentiment Analysis
- Who can use: Support team with approval
- Model: Claude 3 Haiku
- Purpose: Detect customer sentiment
- Approval: CoE review (1 day)

✗ Automated Response
- Who can use: Not self-service
- Model: Claude 3 Opus
- Purpose: Send responses automatically to customers
- Approval: Full governance board review

Example 2: Data Science Workbench

Self-Service Platform: Data Science Workbench

Available without approval:
- Query company data (with privacy controls)
- Run analyses and build models
- Create visualizations
- Share results with team

Available with approval:
- Share externally (requires approval for confidentiality)
- Use in production decision-making
- Deploy as automated system

Process:
1. Data scientist works in sandbox (no approval needed)
2. Achieves results they want
3. Submits for production deployment (approval required)
4. CoE evaluates for fairness, documentation, monitoring
5. Approved or feedback for improvement

Preventing Misuse

Self-service has risks. Prevent misuse through design.

Common Misuses

Prohibited use 1: Discrimination

  • Using protected attributes directly (race, gender, etc.)
  • Using proxies that correlate with protected attributes
  • Prevention: Audit for this, flag if found

Prohibited use 2: Privacy violation

  • Using sensitive data beyond intended scope
  • Not anonymizing personal information
  • Prevention: Data governance controls, audit logs

Prohibited use 3: Security breach

  • Exposing model outputs containing sensitive data
  • Logging confidential information
  • Prevention: Data classification, log filtering

Prohibited use 4: Inaccuracy without care

  • Deploying model without testing
  • Ignoring accuracy below threshold
  • Prevention: Platform prevents deployment without evaluation

Detection and Response

If misuse detected:

  1. Immediate action: Disable system if high risk
  2. Investigation: What happened? How long?
  3. Remediation: Fix the issue
  4. Communication: Affected parties notified
  5. Prevention: Policy or control change

Measuring Self-Service Success

Adoption Metrics

  • Percentage of technical teams using platform
  • Monthly users
  • Monthly models deployed
  • Growth rate

Quality Metrics

  • Average model accuracy
  • Fairness audit pass rate
  • SLA compliance (uptime, performance)
  • User satisfaction

Business Metrics

  • Cost per prediction
  • Time-to-launch new feature
  • Revenue impact
  • Cost savings

Example Success Targets (Year 1)

Adoption:
- 50% of product teams using platform
- 30+ models deployed
- 100M predictions/month
- User satisfaction: 4/5

Quality:
- Average accuracy: 85%+
- 95% pass fairness audit
- 99% uptime
- <1% of deployments need rollback

Business:
- Cost per prediction: $0.000002
- Average time-to-launch: 3 weeks
- Estimated ROI: 2.5x
- 5 new AI-powered features shipped

Strategic Questions

  1. What level of self-service makes sense for you? (Constraints?)
  2. What guardrails are necessary? (Too many kill adoption)
  3. What would good support look like? (Office hours? Docs?)
  4. How will you prevent misuse? (Technology or culture?)
  5. When will self-service be ready? (Phase plan)

Key Takeaway: Self-service AI democratizes capability but requires infrastructure, tools, documentation, and guardrails. Risk-based controls let teams move fast while maintaining standards. Platform prevents bad outcomes while enabling good ones. Success requires excellent documentation, responsive support, and clear governance.

Discussion Prompt

What would self-service AI look like in your organization? What would teams be able to do? What would still require specialists?