Scoping AI-Powered Products
Scoping AI-Powered Products
The Core Challenge
Traditional product scoping is straightforward: you define requirements, design the feature, build it. AI scoping is messier. You don’t know upfront if your approach will work. The cost-quality-speed tradeoff is non-linear (better model might be 10x cost for 5% quality improvement). Users react unpredictably to AI features.
Your job as PM is managing this uncertainty while delivering business value.
Starting with User Research
Before building, understand what users actually need and whether AI is the answer.
1. Problem Validation
Start here, not with “let’s build an AI feature.”
Research questions:
- What problem are users trying to solve?
- How do they currently solve it?
- What pain points exist?
- What’s the cost of the current solution?
- Would they use an automated solution?
Methods:
- Interviews (5-10 users, 30-45 min each)
- Survey (target 50-100 respondents)
- Observation (watch people do the task)
- Usage data analysis (how much time in this task today?)
Example user interview findings:
“Categorizing customer emails takes 10 minutes per 50 emails. Accuracy matters; miscategorized emails annoy our team. Would we use automated categorization? Only if it’s accurate and I can fix mistakes easily.”
2. Solution Viability
Once you know the problem, does AI solve it?
Evaluation questions:
- Is this a task where pattern recognition helps?
- Are there many examples to learn from?
- Is accuracy 80-90% good enough, or do you need 99%?
- Is speed a major constraint?
- Do users need to understand why the AI decided something?
Red flags for AI (use something else):
- Task requires understanding of real-world facts (e.g., “is this person creditworthy”)
- Accuracy needs to be 99.9%
- Explainability is essential and complex
- You have minimal relevant data
- Humans doing it perfectly is cheap/fine
Green flags for AI:
- Many similar examples of correct answers
- Good-enough accuracy is acceptable
- Speed would unlock value
- Cost of errors is manageable
Example evaluation:
“Email categorization: pattern recognition (yes), lots of training data (yes), 85% accuracy acceptable (yes), speed would help (maybe), explainability needed (somewhat). Verdict: AI is appropriate.”
3. MVP Definition
Once you know AI is the right approach, define the smallest valuable product.
MVP = smallest feature that delivers core value
Example: Support Email Categorization
Full product vision:
- Categorize all incoming emails (100+ categories)
- Auto-route to appropriate team
- Suggest response templates
- Learn from corrections
- Integrated with CRM
MVP:
- Categorize 5 most common types (80% of volume)
- Suggest category; humans confirm
- Store categorization for evaluation
- No CRM integration
Scope reduction:
- From 100+ categories → 5 categories
- From auto-routing → human review
- From smart templates → suggested category
- From CRM integration → separate system
Benefits of MVP:
- Launches faster (weeks vs. months)
- Learns from real usage
- Proves value before bigger investment
- Easier to get wrong on small scale
- Feedback shapes v2
4. Success Criteria
Define what success looks like for your MVP.
Metrics to track:
Product metrics:
- Accuracy: How often does the AI categorize correctly?
- Coverage: What percentage of emails does it handle?
- User acceptance: Do people use the feature?
- Satisfaction: Do users trust it?
Business metrics:
- Time saved: How much faster is categorization?
- Adoption: What % of team uses it?
- Quality: Does it reduce errors from current process?
Example success criteria:
| Metric | Target | Acceptable | Failure |
|---|---|---|---|
| Accuracy | 90% | 85% | <80% |
| Coverage | 75% | 60% | <50% |
| User satisfaction | 4.5/5 | 4/5 | <3.5/5 |
| Time savings | 40% | 25% | <15% |
| Adoption | 80% | 60% | <40% |
Decision rule:
- Meet all targets → Scale the feature
- Miss one → Debug and iterate
- Miss multiple → Reassess approach
Know When AI Isn’t the Answer
The “AI is not the answer” cases:
1. You don’t have relevant data
- Problem: Can’t train if you have no examples
- Solution: Collect data first, try AI later
- Example: Predicting fraud in new product with no history
2. Accuracy doesn’t need to be perfect but does need to be high
- Problem: AI reaches 85% but you need 95%+
- Solution: Use AI as assistant (human verifies), not replacement
- Example: Loan approval decisions
3. Explainability is critical and complex
- Problem: Users need to understand why AI made decision
- Solution: Use explainable rules-based systems instead
- Example: Medical diagnosis where patient must understand
4. The task is actually non-deterministic
- Problem: Correct answer depends on context AI can’t see
- Solution: Make human judgment tool better, not replace it
- Example: Creative writing feedback
5. Doing it imperfectly creates bigger problems
- Problem: Wrong answer is worse than no answer
- Solution: Only use AI where errors are acceptable
- Example: Safety-critical systems
The “not yet” cases
Maybe later:
- You have 70% accuracy but need 85% (try research, data improvement)
- Your data is small but growing (wait until you have 10K examples)
- Rules-based system works but doesn’t scale (AI might help later at scale)
Designing the AI/Human Collaboration
Few AI features work alone. Most need human collaboration.
Collaboration Models
1. AI suggests, human decides
- Email categorization: AI suggests category; human confirms/corrects
- Lead scoring: AI scores; human decides to contact
- Content moderation: AI flags; human reviews and approves
2. AI filters, human refines
- AI finds 100 best matches; human chooses best one
- AI generates 5 variations; human picks favorite
- AI identifies candidates; human interviews top 10
3. AI automates routine, human handles exceptions
- AI handles 80% of cases automatically
- Complex/unusual cases go to human
- Example: Support tickets where AI handles FAQ, humans handle unique issues
4. AI amplifies human capability
- AI summarizes 50-page document; human reads summary and asks questions
- AI spots anomalies in data; human investigates why
- AI generates first draft; human edits and refines
Designing for Human-AI Collaboration
Key UX patterns:
Confidence indicators:
- “I’m 92% sure this is urgent” vs. “I’m 60% sure”
- High confidence → surface to user, maybe auto-act
- Low confidence → require human review
Explainability (when needed):
- “Why did you suggest this category?” → “You used words: urgent, broken, doesn’t work”
- Helps user understand and correct
Easy correction:
- User sees AI suggestion
- One click to correct/feedback
- System learns from corrections
Override capability:
- User can always override AI
- System records overrides to improve
Transparency:
- “This was suggested by AI”
- User knows to scrutinize more carefully
Example: Email Categorization UX
Email: "My printer isn't working"
AI Suggestion: "Technical Support" (89% confidence)
[Accept] [Change to...]
If Accept:
→ Email routed to Tech Support team
→ System notes this categorization for learning
If Change:
→ User picks correct category
→ System learns from correction
Data Requirements for AI Features
Before committing to AI, ensure you have data.
Data Assessment
Questions to answer:
-
Do you have labeled examples?
- Email categorization: Do you have 1,000+ categorized emails?
- Sentiment analysis: Do you have labeled positive/negative examples?
- Fraud detection: Do you have fraud labels in historical data?
If no → You might need to label data (expensive, 4-8 weeks) before building
-
Is the data representative?
- Does it cover all cases you want to handle?
- Are there biases in the data?
- If only young users are represented, model might fail for older users
-
Is data quality good?
- Are labels accurate? (Have 2+ people label 10% and compare)
- Is data missing values? (How much will model suffer?)
- Is data up-to-date? (Old data might not predict future)
-
Is data accessible?
- Can you actually query it from your systems?
- Do you have privacy/compliance approval?
- Is it in usable format (structured, not buried in free text)?
Data Readiness Checklist
- You have ≥ 1,000 labeled examples (more is better)
- Labels are ≥ 90% consistent (re-label sample to verify)
- Data represents your actual use cases
- No major privacy/compliance blockers
- You can access data from production systems
- Data is reasonably current
If you can’t check all boxes: Plan data work before building model.
Feasibility Assessment
Before committing, do a quick feasibility assessment.
Feasibility Scoring
Score each dimension 1-5:
Data (1-5):
- 5: 10K+ labeled examples, high quality, accessible
- 3: 2K labeled examples, decent quality
- 1: <500 examples or very noisy
Technical (1-5):
- 5: Straightforward task, existing approaches work
- 3: Some integration complexity, some technical challenges
- 1: Novel task, uncertain approach
Business (1-5):
- 5: Clear value, executive support, budget approved
- 3: Solid ROI but lower priority
- 1: Uncertain value, competing priorities
Total score:
- 13-15: Ready to build
- 10-12: Ready with risk mitigation
- <10: More exploration needed
Example Scorecard
Email categorization:
- Data: 4 (we have 5K labeled emails, good quality)
- Technical: 4 (straightforward NLP, existing approaches proven)
- Business: 5 (clear ROI, executive support)
- Total: 13 → Ready to build
Email response templates:
- Data: 2 (only 100 human-written templates, need more)
- Technical: 2 (generative, harder to ensure quality)
- Business: 3 (nice feature but not critical)
- Total: 7 → More exploration needed
Kickoff Readiness
Before starting build, ensure you have:
Clarity:
- Problem statement everyone agrees on
- Success metrics clearly defined
- MVP scope clearly defined
- Data is ready (or plan for data work)
Team:
- PM (you) owns product decision
- Engineer assigned and understands approach
- Data scientist available if needed
- Design involved (for UX/flow)
Support:
- Executive sponsor aware and supportive
- Budget allocated
- Stakeholders informed of timeline and risks
Risk Management:
- Key risks identified and mitigation planned
- Decision criteria defined (what makes us pivot?)
- Contingency plan if initial approach doesn’t work
Only proceed if you’ve checked these boxes.
Key Takeaway: Start with user research to validate the problem. Determine if AI is the right solution (it often isn’t). Define an MVP that’s small enough to learn from quickly. Assess data readiness. Score feasibility. Clarify success metrics. Only then start building. Good scoping saves months of wasted effort.
Discussion Prompt
For your next AI feature idea: Have you validated the user problem? Would AI actually solve it better than alternatives? What’s your honest MVP scope?