Intermediate

AI Product Development Lifecycle

Lesson 3 of 4 Estimated Time 45 min

AI Product Development Lifecycle

The AI Product Lifecycle is Different

Traditional products follow: Concept → Build → Launch → Iterate AI products follow: Research → Prototype → Pilot → Production → Monitor & Retrain

The key differences:

  • You don’t know if your approach will work until you try
  • Data quality and volume directly affect results
  • Model performance degrades over time
  • Iteration continues post-launch (unlike traditional products)

Phase 1: Research (Weeks 1-4)

Goal: Understand the problem and validate that AI can solve it.

Activities:

  • User research: What’s the actual problem?
  • Data exploration: What training data exists?
  • Approach research: What techniques might work?
  • Quick proof-of-concept: Can we build a prototype in days?

Deliverable:

  • Research summary: Problem validated, AI approach sensible, rough timeline and resource estimate

Decision gate:

  • Go: Problem is real, AI seems viable, team is aligned
  • No-go: Problem isn’t AI-suitable or ROI is unclear
  • Pivot: Different approach seems better

Timeline: 2-4 weeks Team: PM + Engineer + Data Scientist Cost: $10-30K

Phase 2: Prototype (Weeks 5-12)

Goal: Build a basic working version to understand feasibility and quality.

Activities:

  • Data preparation: Get training data ready
  • Model exploration: Try 3-5 approaches, measure accuracy
  • Integration planning: How will this connect to your systems?
  • UX wireframes: Rough mockups of how users interact
  • Cost modeling: What will this actually cost to run?

Deliverable:

  • Working prototype with measured performance
  • Architecture diagram
  • Cost estimate
  • 2-3 UX wireframes
  • List of unknowns and risks

Decision gate:

  • Go to pilot: Accuracy is acceptable, integration seems feasible, ROI makes sense
  • Pivot: Different approach needed, or problem reframed
  • No-go: Accuracy too low, integration too hard, or ROI insufficient

Timeline: 4-8 weeks Team: Engineer + Data Scientist + PM + Designer Cost: $50-150K

Phase 3: Pilot (Weeks 13-28)

Goal: Test with real users and validate business value at small scale.

Activities:

  • Develop production MVP: Build for real usage, not just demo
  • Real data: Run on actual user data, not samples
  • User testing: 50-200 users in closed pilot
  • Monitoring setup: Track accuracy, latency, cost, user behavior
  • Feedback loops: Gather user feedback, make quick improvements
  • A/B testing (optional): Compare AI approach to current approach

Deliverable:

  • Pilot system in production (limited scope)
  • Usage data showing accuracy, performance, user adoption
  • Cost per user/transaction
  • User feedback summary
  • List of improvements for scale

Decision gate:

  • Scale: Metrics hit targets, users see value, economics work
  • Iterate: Modify approach based on feedback, run another 4-week pilot
  • Kill: Metrics show this won’t work

Timeline: 6-12 weeks Team: Engineer + Data Scientist + PM + Designer + Operations Cost: $150-400K

Key pilot metrics:

  • Accuracy on real data: Is it as good as on test data?
  • User adoption: What % of eligible users are using this?
  • User satisfaction: Do they find it valuable? Trustworthy?
  • Cost per unit: How much does each transaction cost?
  • Error patterns: Where does it fail? Can we fix those cases?

Phase 4: Production Scale (Months 6-12+)

Goal: Scale to full user base, optimize, and maintain quality.

Activities:

  • Expand rollout: Roll out to 25% → 50% → 100% of users
  • Performance optimization: Reduce latency, cost, improve accuracy
  • Infrastructure hardening: Ensure reliability, monitoring, alerts
  • Operations setup: Incident response, user support, retraining
  • Governance: Ensure compliance, fairness, proper oversight

Deliverable:

  • Production system serving all users
  • SLA dashboard (uptime, latency, accuracy)
  • Cost per unit optimized
  • Operations playbook
  • Monitoring and alerting

Timeline: 4-12 months Team: Engineer + Data Scientist + PM + Operations + (ML Ops if large scale)

Key production metrics:

  • Uptime: 99%+ availability
  • Latency: Response time meets user expectations
  • Accuracy: Maintains accuracy on live data
  • Cost: Per-transaction cost is acceptable
  • Incidents: Response time to problems

Phase 5: Monitor and Iterate (Ongoing)

Goal: Maintain quality, catch issues early, improve continuously.

Activities:

  • Monitor accuracy: Does model stay accurate over time?
  • Detect drift: Has data distribution changed?
  • Collect feedback: Gather user corrections and feedback
  • Retrain: Periodically retrain on new data
  • A/B test improvements: Test new approaches
  • Expand scope: Add new use cases, new models

Deliverable:

  • Monthly monitoring reports
  • Quarterly improvements
  • Annual strategy updates

Timeline: Continuous Team: Data Scientist + Engineer (10-20% of time) + Operations (ongoing)

The Data Journey

Data quality drives everything. Plan for data maturity.

Pre-Launch

Phase 1-2 (Research/Prototype):

  • Source training data
  • Label/verify quality (aim for 2+ people labeling, 90%+ agreement)
  • Size: Need ≥1,000 good examples per category

Phase 3 (Pilot):

  • Real-world data validation: Does training data match real usage?
  • Rebalance if needed: Are some categories over/under-represented?
  • Continuous labeling: Collect human labels on pilot data for retraining

Phase 4 (Scale):

  • Data pipeline: Automated data collection and quality checks
  • Retraining schedule: Periodic retraining on newest data
  • Drift detection: Monitor for data distribution shifts

Post-Launch Maintenance

Ongoing:

  • Monthly: Sample and label new data
  • Quarterly: Check for data drift
  • Quarterly: Retrain on new data
  • Annually: Comprehensive audit of data quality

Signs of data problems:

  • Accuracy dropping over time
  • New data looks different from training data
  • User corrections show systematic patterns
  • High error rate on new data types

Iteration Cadence

AI products iterate faster than you might expect, even in production.

Week-to-Week (During Active Development)

  • Daily standup: What’s working, what’s not
  • Friday demo: Show progress, get feedback
  • Weekly retro: What did we learn?

Month-to-Month (Early Production)

  • Weekly accuracy checks: Is model performing well?
  • Weekly user feedback review: What issues are users hitting?
  • Monthly improvements: Ship one meaningful improvement
  • Monthly retraining: Incorporate user feedback into new model

Quarter-to-Quarter (Mature Production)

  • Quarterly accuracy review: Is accuracy trending up/down?
  • Quarterly A/B tests: Run 2-3 experiments to improve
  • Quarterly retraining: Retrain on latest data
  • Quarterly expansion: Add new capability or scale to new use case

The Role of A/B Testing

A/B testing validates improvements and catches regressions.

When to A/B Test

Always test:

  • Model changes: New model vs. old model
  • UX changes: Does new UX actually improve adoption?
  • Feature changes: Does explanation actually help?
  • Scope expansion: Does this work for new problem?

Don’t test (obvious improvement):

  • Bug fixes
  • Performance optimization that improves both accuracy and speed

Structuring an A/B Test

Setup:

  • Control (current experience)
  • Treatment (new approach)
  • Traffic split: Usually 50/50
  • Duration: 2-4 weeks (enough for statistical significance)
  • Sample size: Thousands of users/transactions (not hundreds)

Metrics:

  • Primary metric: What we’re trying to improve
  • Secondary metrics: Don’t break anything else
  • Guardrail metrics: Stop if something regresses too much

Example:

Test: Does showing confidence score increase trust?

Control: AI suggestion (no confidence)
Treatment: AI suggestion + confidence score (e.g., "92% confident")

Primary metric: User trust rating (target +10%)
Secondary: User satisfaction (shouldn't decrease)
Guardrail: Latency (shouldn't increase >10%)
Duration: 3 weeks
Sample: 10K users

Decision rule:

  • Primary metric improves significantly (>5%, p<0.05): Ship it
  • Primary metric flat or negative: Don’t ship
  • Guardrail metric breaches: Kill test immediately

Timeline Summary

From concept to steady-state:

PhaseDurationCostOutput
Research2-4 weeks$10-30KGo/no-go decision
Prototype4-8 weeks$50-150KWorking prototype
Pilot6-12 weeks$150-400KValidated model
Scale4-12 months$400K-1MProduction system
Total6-18 months$610K-1.6MRunning product

Variables that affect timeline:

  • Data readiness: +4-8 weeks if data prep needed
  • Technical complexity: +2-4 weeks if integration hard
  • Organizational readiness: +4-8 weeks if change management slow
  • Model performance: +4-8 weeks if accuracy challenging

Common Timeline Mistakes

Mistake 1: Skipping prototype, going straight to pilot

  • Reality: Takes longer because you’re still learning approach
  • Fix: Invest 4-8 weeks in prototype to derisk

Mistake 2: Pilot too long (6+ months)

  • Reality: Market changes, team loses momentum, costs balloon
  • Fix: Pilots should be 6-12 weeks max

Mistake 3: Not planning for post-launch iteration

  • Reality: Launch feels like finish, but it’s really beginning
  • Fix: Plan for ongoing monitoring and retraining

Mistake 4: Too few users in pilot

  • Reality: Don’t discover real issues until scale (embarrassing)
  • Fix: Pilot with at least 100 real users, ideally 500+

Strategic Questions

  1. Are you in exploration, validation, or scaling phase? What’s the focus?
  2. What are your key decision gates? When do you go/no-go?
  3. How will you know if the pilot succeeded? Define metrics now.
  4. Who owns post-launch iteration? Who maintains this once shipped?
  5. What’s your retraining strategy? How often, what triggers it?

Key Takeaway: AI product development has distinct phases: research, prototype, pilot, scale, maintain. Each phase has different goals and decision criteria. Don’t skip phases or you’ll waste time and money. Plan for post-launch iteration—the work doesn’t end at launch; it begins there.

Discussion Prompt

For your product: What phase are you in? What’s your next decision gate? What would success look like at your current phase?