Managing AI Project Uncertainty
Managing AI Project Uncertainty
Why AI Projects Are Different
Traditional projects have known unknowns: you know what you’re building, roughly how long it takes, and what it will cost. You might be wrong, but you have a baseline.
AI projects have unknown unknowns. You don’t know if your approach will work until you try. The model might not be accurate enough. Integration might be harder than expected. The business might change its mind about what constitutes “good enough.”
This requires fundamentally different project management practices.
The Spike-Based Planning Approach
Most successful AI projects use spike-based planning: short exploration phases that answer key questions.
What’s a Spike?
A spike is a time-boxed exploration to answer a specific question. Typically 1-2 weeks.
Examples:
- “Can we achieve 85% accuracy on email classification?” (1 week spike)
- “How hard is integrating with our legacy system?” (1 week spike)
- “Will customers trust AI-generated summaries?” (2 week spike with user testing)
Key characteristics:
- Time-boxed: Never longer than 2 weeks
- Specific question: “What we want to know” is clear
- Prototype, not product: Not trying to build final system
- Learnings, not code: Output is knowledge, not necessarily reusable code
- Decision-gated: Spike ends with go/no-go decision
Structuring Spikes
Pre-spike checklist:
- What are we trying to learn?
- What would success look like? (concrete definition)
- What would failure look like?
- Who needs to approve continuation based on results?
- What’s our contingency if this fails?
During spike:
- Daily standup (15 min) — What did we learn? Any blockers?
- Mid-spike checkpoint (day 4-5) — Are we on track? Do we need to pivot?
- Rapid iteration — Don’t wait for perfection
End of spike:
- 30-minute readout to stakeholders
- Clear findings and recommendation
- Go/no-go decision
Example Spike Sequence
Spike 1: Is the data good enough? (Week 1)
- Question: Do we have 5,000+ labeled examples for email classification?
- Success: Historical data available, labeled with 90%+ confidence
- Outcome: We have training data; move forward
- Cost: 1 person-week
Spike 2: Does basic approach work? (Week 2)
- Question: Using existing models and basic prompting, can we reach 80% accuracy?
- Success: Test model achieves 80% on holdout set
- Outcome: Accuracy is only 72%; need to pivot
- Cost: 1.5 person-weeks
Pivot decision: Is 72% acceptable? No. Try different approach.
Spike 3: Can we improve accuracy? (Week 3)
- Question: With few-shot prompting and custom fine-tuning, can we reach 85%?
- Success: Achieve 86% accuracy
- Outcome: Yes! Now feasible
- Cost: 2 person-weeks
Spike 4: Can we integrate? (Week 4)
- Question: How hard is API integration with email system?
- Success: Proof-of-concept API call working from email system
- Outcome: Straightforward; 3-week integration estimate
- Cost: 1.5 person-weeks
Total spike investment: ~6 person-weeks → Answers 4 critical questions before committing to full build
Experiment-Driven Development
Once you’ve passed initial spikes, use experiment-driven development.
Structuring Experiments
Goal: Answer specific questions about AI quality, user adoption, or business impact.
Example experiment:
- Hypothesis: If we show confidence scores with AI-generated summaries, users will trust them more
- Test: A/B test: Control (summary only), Test (summary + confidence score)
- Duration: 2 weeks with 1,000 users
- Success metric: Trust rating increases 10%+
- Outcome: Determines whether feature ships with confidence scores
Running Experiments Safely
- Start with small audience: 10% of users, then scale
- Monitor continuously: If error rate > threshold, stop
- Have kill switch: Can immediately disable if problems occur
- Clear acceptance criteria: If X metric < Y, we stop
Example acceptance criteria:
- Error rate < 5% (if > 5%, stop and debug)
- User satisfaction stays ≥ 4.0 (if drops, stop)
- System latency < 3 seconds (if slower, debug)
Stakeholder Management with Uncertainty
The challenge: executives want timelines; you’re uncertain.
How to Present Uncertainty
Bad approach:
“We don’t know how long this will take.”
Good approach:
“We’ve done a 2-week spike on feasibility. Here’s what we learned:
- We can achieve 85% accuracy (good enough for launch)
- Integration will take 4 weeks
- Main risk is user adoption; we’ll A/B test before scaling
Our timeline: 2 week spike (done) → 4 week build → 2 week A/B test → Launch Total: 8 weeks if all goes well. If any spike fails, we’ll reassess.”
Three Timelines
Present three versions:
Optimistic: Everything works as expected
- Spike results are good
- Integration is straightforward
- Users adopt quickly
- Example: 8 weeks
Expected: Some issues, normal rework
- One technical blocker requiring rework
- User testing reveals UX issue requiring fix
- One week of debugging
- Example: 12 weeks
Pessimistic: Significant challenges
- Accuracy harder than expected; need more data or fine-tuning
- Integration reveals unexpected dependency
- Users don’t trust feature; requires redesign
- Example: 20 weeks (or decision to kill)
Recommendation:
“We’re planning for the 12-week timeline. If results beat expectations, we ship earlier. If challenges emerge, we’ll have honest conversation by week 6.”
Setting Expectations Upfront
Before starting:
- “This is uncertain; we’ll learn as we go”
- “We’ll make go/no-go decisions at week 2, 4, 6”
- “If initial results aren’t promising, we’ll pivot or stop”
- “Success means learning, not necessarily shipping”
This prevents surprise disappointment.
Managing Scope with Uncertainty
Problem: Scope creep kills AI projects because the actual work is hard.
MVP Thinking for AI
Define minimum viable version:
- What’s the smallest thing that creates value?
- What can we do in the baseline timeline?
- What’s nice-to-have we can cut?
Example: Support Automation MVP
Full vision: AI handles 80% of all support tickets MVP: AI drafts responses to 5 common question types, humans review and send
- Full vision: 16 weeks, risky
- MVP: 6 weeks, valuable, proves approach
Launch with MVP, expand after validation.
Cutting Scope as You Learn
As you work, you’ll learn that some things are harder than expected.
When to cut scope:
- You’re hitting complexity that delays beyond 20% original estimate
- Benefit of feature doesn’t justify the cost
- User testing shows feature isn’t valuable
- Engineering says “this will take 3x longer than estimated”
Don’t say: “We’re behind schedule.” Say: “We’re learning that X is harder than expected. We can either:
- Extend timeline
- Cut scope (remove feature Y)
- Pivot approach
Here’s my recommendation…”
Risk Management in AI
Risk Classification
High risk (deal breakers):
- Model accuracy doesn’t meet minimum threshold
- Data isn’t available (can’t proceed)
- Technical integration impossible
- Legal/compliance shows it can’t ship
Medium risk (requires mitigation):
- Accuracy is 80% but we need 90% (might be fixable)
- Integration is complex (but possible)
- User adoption slow (might improve with design)
Low risk (manageable):
- Minor UX tweaks needed
- API performance not optimal (fixable)
- Team learns slowly (expected)
Risk Mitigation Strategies
For each high/medium risk:
- Identify the risk clearly
- Assess probability (likely? unlikely?)
- Assess impact (fatal? annoying?)
- Mitigation plan (what will we do about it?)
- Decision trigger (what would make us stop?)
Example Risk Register:
| Risk | Probability | Impact | Mitigation | Trigger |
|---|---|---|---|---|
| Model accuracy <80% | Medium | Fatal | Spike on data quality; try different model | If accuracy <75% after spike, pivot |
| Integration takes 6+ weeks | Medium | Medium | Proof-of-concept first; allocate senior eng | If PoC takes >2 weeks, reassess |
| Users don’t trust AI | Medium | High | A/B test with confidence scores; human oversight | If trust score <3/5, redesign |
Velocity and Capacity Planning
Estimating with Uncertainty
Don’t estimate in “points.” Estimate in weeks of exploratory work:
Bad: “This story is 8 points” Good: “This will take 2 weeks to understand if it’s possible, then 4 weeks to build if it is”
Buffer for Learning
Add 30-50% buffer for learning and unexpected challenges:
Engineering estimate: 4 weeks With learning buffer: 5-6 weeks Communicated to stakeholders: “6-8 weeks”
This gives you room when things take longer and you look good if you finish early.
Capacity Planning
If team capacity is 20 weeks/quarter:
- Allocate 15 weeks to committed work (70%)
- Reserve 5 weeks for learning, spikes, unexpected issues (30%)
This prevents burnout and gives space for real exploration.
Celebrating Valuable Failures
A spike that teaches us something is a win, even if the answer is “this won’t work.”
Culture practice:
- Publicize learnings from failed spikes
- Celebrate the time saved by failing fast
- Frame as “avoided 16 weeks of wasted effort”
- Never shame teams for negative results
Example celebration:
“Great news: The spam detection approach wouldn’t work at our scale. Sarah and team figured this out in 2 weeks instead of 8. We’re now exploring approach B which looks more promising. Thanks for the rigorous work.”
Metrics for Uncertain Projects
Track different metrics than traditional projects:
Traditional metrics:
- Tasks completed
- On-time delivery
- Budget vs. actual
AI project metrics:
- Spikes completed and key questions answered
- Experiments run and learnings captured
- Technical debt incurred vs. managed
- Team morale and learning
Strategic Questions
- What are our key uncertainties? Make them explicit.
- How will we resolve them? Spikes? Experiments? User research?
- What would make us pivot or stop? Know decision criteria upfront.
- How will we communicate with stakeholders? Set realistic expectations.
- How do we celebrate learning from failures? Create safe culture for exploration.
Key Takeaway: AI projects require different management because uncertainty is fundamental. Use spike-based planning to explore quickly. Run experiments to validate assumptions. Present three timelines and decision criteria upfront. Cut scope as you learn. Build culture that celebrates valuable failures. Reserve capacity for learning, not just delivery.
Discussion Prompt
For your next AI project: What are the 3 biggest uncertainties? How would you structure spikes to answer them? What would make you kill the project?