Self-Service AI Capabilities
Self-Service AI Capabilities
The Self-Service Vision
Imagine every team in your company can use AI without waiting for specialists. Product teams can try new models. Marketing can generate variations of copy. Support can fine-tune a model for better customer domain understanding.
Self-service AI is the ultimate scaling model. It democratizes AI capability while maintaining safety and standards through guardrails.
Self-Service vs. DIY
Self-service: Company provides tools and infrastructure; teams do the work
- Teams have agency and control
- Faster than waiting for specialists
- Still guided by standards and governance
- Examples: Prompt library, training tools, evaluation frameworks
DIY (avoid): Teams building AI with no guidance
- Quality and consistency suffering
- Risk of errors and bias
- Knowledge not shared
- Duplication of effort
Managed service: CoE builds everything for teams
- Highest quality and consistency
- Slowest (waiting for specialists)
- Doesn’t scale
- Teams have limited control
Best approach: Self-service + managed service hybrid
- Self-service for straightforward problems
- Managed service for complex or high-risk
- Teams have choice
Self-Service Infrastructure
What must be in place for self-service to work?
1. Model and API Availability
Make models easy to access:
- List of approved models and APIs (what can we use?)
- Clear comparison (which is best for this use case?)
- Cost for each (what will it cost?)
- Examples (how do I use it?)
Example:
Approved Models for Text Processing:
Claude 3 Opus
- Cost: $15/M input, $75/M output tokens
- Accuracy: 90%+ on complex reasoning
- Good for: Analysis, deep understanding
- Example: Legal document analysis
GPT-4 Turbo
- Cost: $10/M input, $30/M output tokens
- Accuracy: 88% on structured tasks
- Good for: Classification, extraction
- Example: Email categorization
Claude 3 Haiku
- Cost: $0.25/M input, $1.25/M output tokens
- Accuracy: 85% on straightforward tasks
- Good for: Simple classification, summaries
- Example: Topic tagging
2. Prompt Library and Templates
Make it easy to start:
- Library of prompts for common tasks
- Copy-paste starting point
- Documentation of what works
Example prompt library:
Prompt: Email classification
Purpose: Categorize emails into support categories
Model: Claude 3 Haiku
Category: Support operations
[System prompt]:
You are a helpful email classifier. Analyze the customer email and
categorize it as one of: Billing, Technical, Account, Refund, Other
[User prompt]:
Categorize this email:
{email_text}
Expected output: Category name + confidence (high/medium/low)
Example:
Input: "My invoice is wrong, charged $500 instead of $50"
Output: Billing, high confidence
Tips:
- Subject line often indicates category
- First few sentences usually most relevant
- Can handle common typos and variations
3. Evaluation and Testing Tools
Make quality assessment easy:
- Accuracy evaluation framework (does it work?)
- Fairness evaluation (is it biased?)
- Cost tracking (how much did it cost?)
- Comparison tool (which approach is better?)
Example evaluation tool:
Model Evaluation Framework
Step 1: Create test set
- Collect 50-100 examples where you know the right answer
- Save as CSV (input, expected_output)
- Upload to platform
Step 2: Run evaluation
- Select model to test
- Platform runs model on all examples
- Calculates accuracy, errors
Step 3: Review results
- Accuracy score (percentage correct)
- Error analysis (common mistakes?)
- Fairness check (same accuracy across groups?)
Step 4: Compare approaches
- Run evaluation on different models
- See accuracy vs. cost tradeoff
- Pick best approach
Output: Report showing accuracy, cost, time to result
4. Documentation and Training
Make it self-discoverable:
- “Getting started” guide (15 minutes to first result)
- API documentation (how to call it)
- Common patterns (copy-paste examples)
- Troubleshooting (common problems and fixes)
- Office hours (when to ask for help)
5. Monitoring and Alerts
Make it safe:
- Monitor accuracy (alert if drops >5%)
- Monitor cost (alert if high spending)
- Monitor failures (alert on errors)
- Dashboard (see how you’re doing)
Example alert:
Alert: Model accuracy dropped
Current: 78% (vs. target 85%)
What happened: New data type (different emails) showing lower accuracy
Recommended action: Collect examples of this type, retrain
Help: Click to see new error patterns, or contact AI team
Governance for Self-Service
Self-service needs guardrails. Too strict and teams bypass it. Too loose and bad things happen.
Risk-Based Controls
Low-risk (minimal controls):
- Using existing, proven models on current data
- Internal use only (no external impact)
- Standard tasks (classification, summarization)
- Requires: Just compliance checklist
Medium-risk (moderate controls):
- Customer-facing features
- New models or approaches
- Decision-making (more than just suggestion)
- Requires: Fairness evaluation, human review, approval
High-risk (strict controls):
- High-stakes decisions (hiring, lending, medical)
- New data source
- Autonomous decisions
- Requires: Full governance review, external audit, legal
Self-Service Approval Process
For low-risk:
- Team fills out 1-page form
- Automated validation checks
- Auto-approved if passes
- Can ship immediately
For medium-risk:
- Team fills out 3-page form (problem, data, risks)
- Automated validation checks
- Fairness evaluation (automated if possible)
- Review by CoE (2-3 days)
- Approval or feedback
For high-risk:
- Full governance board review
- Risk assessment
- Regulatory/legal review
- 1-2 week process
Guardrails in Technology
What the platform prevents:
- Using untrained models (must be evaluated first)
- Using without monitoring (must track accuracy)
- Deploying low-confidence models (accuracy too low)
- Using without approval (governance process enforced)
Example guardrail:
Check before deploying:
1. Model accuracy ≥ target? YES
2. Fairness evaluation passed? YES
3. Cost per prediction < budget? YES
4. Approval obtained? YES (Low-risk auto-approved)
5. Monitoring configured? YES (Alerts set)
Result: ✓ READY TO DEPLOY
If any checks fail:
Result: ✗ BLOCKED - Fix these issues before deploying
- Accuracy is 78%; need ≥85%
- Contact AI team for help improving
Self-Service Models and APIs
What should teams have access to?
Always Available (No Approval)
- Major LLMs with strong track records (OpenAI, Anthropic, Google)
- Embedding models
- Standard classification/summarization approaches
- Internal tools and frameworks
Rationale: Low risk, proven, been used successfully
Available with Approval
- Fine-tuned models on company data
- Custom models for specific use case
- Real-time decision systems
- Customer-facing automated decisions
Rationale: Higher risk, need to verify before use
Case-by-Case
- Experimental models
- Models with fairness concerns
- Medical/safety-critical models
- Government decision-making
Rationale: High risk, need careful review
Self-Service Examples
Example 1: Prompt Library for Support
Support AI Prompt Library
✓ Email Classification
- Who can use: Support team (self-service)
- Model: Claude 3 Haiku
- Purpose: Route incoming emails
- Approval: Auto-approved
✓ Response Templates
- Who can use: Support team
- Model: Claude 3 Opus
- Purpose: Draft responses to common questions
- Approval: Auto-approved
✓ Sentiment Analysis
- Who can use: Support team with approval
- Model: Claude 3 Haiku
- Purpose: Detect customer sentiment
- Approval: CoE review (1 day)
✗ Automated Response
- Who can use: Not self-service
- Model: Claude 3 Opus
- Purpose: Send responses automatically to customers
- Approval: Full governance board review
Example 2: Data Science Workbench
Self-Service Platform: Data Science Workbench
Available without approval:
- Query company data (with privacy controls)
- Run analyses and build models
- Create visualizations
- Share results with team
Available with approval:
- Share externally (requires approval for confidentiality)
- Use in production decision-making
- Deploy as automated system
Process:
1. Data scientist works in sandbox (no approval needed)
2. Achieves results they want
3. Submits for production deployment (approval required)
4. CoE evaluates for fairness, documentation, monitoring
5. Approved or feedback for improvement
Preventing Misuse
Self-service has risks. Prevent misuse through design.
Common Misuses
Prohibited use 1: Discrimination
- Using protected attributes directly (race, gender, etc.)
- Using proxies that correlate with protected attributes
- Prevention: Audit for this, flag if found
Prohibited use 2: Privacy violation
- Using sensitive data beyond intended scope
- Not anonymizing personal information
- Prevention: Data governance controls, audit logs
Prohibited use 3: Security breach
- Exposing model outputs containing sensitive data
- Logging confidential information
- Prevention: Data classification, log filtering
Prohibited use 4: Inaccuracy without care
- Deploying model without testing
- Ignoring accuracy below threshold
- Prevention: Platform prevents deployment without evaluation
Detection and Response
If misuse detected:
- Immediate action: Disable system if high risk
- Investigation: What happened? How long?
- Remediation: Fix the issue
- Communication: Affected parties notified
- Prevention: Policy or control change
Measuring Self-Service Success
Adoption Metrics
- Percentage of technical teams using platform
- Monthly users
- Monthly models deployed
- Growth rate
Quality Metrics
- Average model accuracy
- Fairness audit pass rate
- SLA compliance (uptime, performance)
- User satisfaction
Business Metrics
- Cost per prediction
- Time-to-launch new feature
- Revenue impact
- Cost savings
Example Success Targets (Year 1)
Adoption:
- 50% of product teams using platform
- 30+ models deployed
- 100M predictions/month
- User satisfaction: 4/5
Quality:
- Average accuracy: 85%+
- 95% pass fairness audit
- 99% uptime
- <1% of deployments need rollback
Business:
- Cost per prediction: $0.000002
- Average time-to-launch: 3 weeks
- Estimated ROI: 2.5x
- 5 new AI-powered features shipped
Strategic Questions
- What level of self-service makes sense for you? (Constraints?)
- What guardrails are necessary? (Too many kill adoption)
- What would good support look like? (Office hours? Docs?)
- How will you prevent misuse? (Technology or culture?)
- When will self-service be ready? (Phase plan)
Key Takeaway: Self-service AI democratizes capability but requires infrastructure, tools, documentation, and guardrails. Risk-based controls let teams move fast while maintaining standards. Platform prevents bad outcomes while enabling good ones. Success requires excellent documentation, responsive support, and clear governance.
Discussion Prompt
What would self-service AI look like in your organization? What would teams be able to do? What would still require specialists?