Intermediate

Managing AI Project Uncertainty

Lesson 3 of 4 Estimated Time 45 min

Managing AI Project Uncertainty

Why AI Projects Are Different

Traditional projects have known unknowns: you know what you’re building, roughly how long it takes, and what it will cost. You might be wrong, but you have a baseline.

AI projects have unknown unknowns. You don’t know if your approach will work until you try. The model might not be accurate enough. Integration might be harder than expected. The business might change its mind about what constitutes “good enough.”

This requires fundamentally different project management practices.

The Spike-Based Planning Approach

Most successful AI projects use spike-based planning: short exploration phases that answer key questions.

What’s a Spike?

A spike is a time-boxed exploration to answer a specific question. Typically 1-2 weeks.

Examples:

  • “Can we achieve 85% accuracy on email classification?” (1 week spike)
  • “How hard is integrating with our legacy system?” (1 week spike)
  • “Will customers trust AI-generated summaries?” (2 week spike with user testing)

Key characteristics:

  • Time-boxed: Never longer than 2 weeks
  • Specific question: “What we want to know” is clear
  • Prototype, not product: Not trying to build final system
  • Learnings, not code: Output is knowledge, not necessarily reusable code
  • Decision-gated: Spike ends with go/no-go decision

Structuring Spikes

Pre-spike checklist:

  1. What are we trying to learn?
  2. What would success look like? (concrete definition)
  3. What would failure look like?
  4. Who needs to approve continuation based on results?
  5. What’s our contingency if this fails?

During spike:

  • Daily standup (15 min) — What did we learn? Any blockers?
  • Mid-spike checkpoint (day 4-5) — Are we on track? Do we need to pivot?
  • Rapid iteration — Don’t wait for perfection

End of spike:

  • 30-minute readout to stakeholders
  • Clear findings and recommendation
  • Go/no-go decision

Example Spike Sequence

Spike 1: Is the data good enough? (Week 1)

  • Question: Do we have 5,000+ labeled examples for email classification?
  • Success: Historical data available, labeled with 90%+ confidence
  • Outcome: We have training data; move forward
  • Cost: 1 person-week

Spike 2: Does basic approach work? (Week 2)

  • Question: Using existing models and basic prompting, can we reach 80% accuracy?
  • Success: Test model achieves 80% on holdout set
  • Outcome: Accuracy is only 72%; need to pivot
  • Cost: 1.5 person-weeks

Pivot decision: Is 72% acceptable? No. Try different approach.

Spike 3: Can we improve accuracy? (Week 3)

  • Question: With few-shot prompting and custom fine-tuning, can we reach 85%?
  • Success: Achieve 86% accuracy
  • Outcome: Yes! Now feasible
  • Cost: 2 person-weeks

Spike 4: Can we integrate? (Week 4)

  • Question: How hard is API integration with email system?
  • Success: Proof-of-concept API call working from email system
  • Outcome: Straightforward; 3-week integration estimate
  • Cost: 1.5 person-weeks

Total spike investment: ~6 person-weeks → Answers 4 critical questions before committing to full build

Experiment-Driven Development

Once you’ve passed initial spikes, use experiment-driven development.

Structuring Experiments

Goal: Answer specific questions about AI quality, user adoption, or business impact.

Example experiment:

  • Hypothesis: If we show confidence scores with AI-generated summaries, users will trust them more
  • Test: A/B test: Control (summary only), Test (summary + confidence score)
  • Duration: 2 weeks with 1,000 users
  • Success metric: Trust rating increases 10%+
  • Outcome: Determines whether feature ships with confidence scores

Running Experiments Safely

  1. Start with small audience: 10% of users, then scale
  2. Monitor continuously: If error rate > threshold, stop
  3. Have kill switch: Can immediately disable if problems occur
  4. Clear acceptance criteria: If X metric < Y, we stop

Example acceptance criteria:

  • Error rate < 5% (if > 5%, stop and debug)
  • User satisfaction stays ≥ 4.0 (if drops, stop)
  • System latency < 3 seconds (if slower, debug)

Stakeholder Management with Uncertainty

The challenge: executives want timelines; you’re uncertain.

How to Present Uncertainty

Bad approach:

“We don’t know how long this will take.”

Good approach:

“We’ve done a 2-week spike on feasibility. Here’s what we learned:

  • We can achieve 85% accuracy (good enough for launch)
  • Integration will take 4 weeks
  • Main risk is user adoption; we’ll A/B test before scaling

Our timeline: 2 week spike (done) → 4 week build → 2 week A/B test → Launch Total: 8 weeks if all goes well. If any spike fails, we’ll reassess.”

Three Timelines

Present three versions:

Optimistic: Everything works as expected

  • Spike results are good
  • Integration is straightforward
  • Users adopt quickly
  • Example: 8 weeks

Expected: Some issues, normal rework

  • One technical blocker requiring rework
  • User testing reveals UX issue requiring fix
  • One week of debugging
  • Example: 12 weeks

Pessimistic: Significant challenges

  • Accuracy harder than expected; need more data or fine-tuning
  • Integration reveals unexpected dependency
  • Users don’t trust feature; requires redesign
  • Example: 20 weeks (or decision to kill)

Recommendation:

“We’re planning for the 12-week timeline. If results beat expectations, we ship earlier. If challenges emerge, we’ll have honest conversation by week 6.”

Setting Expectations Upfront

Before starting:

  • “This is uncertain; we’ll learn as we go”
  • “We’ll make go/no-go decisions at week 2, 4, 6”
  • “If initial results aren’t promising, we’ll pivot or stop”
  • “Success means learning, not necessarily shipping”

This prevents surprise disappointment.

Managing Scope with Uncertainty

Problem: Scope creep kills AI projects because the actual work is hard.

MVP Thinking for AI

Define minimum viable version:

  • What’s the smallest thing that creates value?
  • What can we do in the baseline timeline?
  • What’s nice-to-have we can cut?

Example: Support Automation MVP

Full vision: AI handles 80% of all support tickets MVP: AI drafts responses to 5 common question types, humans review and send

  • Full vision: 16 weeks, risky
  • MVP: 6 weeks, valuable, proves approach

Launch with MVP, expand after validation.

Cutting Scope as You Learn

As you work, you’ll learn that some things are harder than expected.

When to cut scope:

  • You’re hitting complexity that delays beyond 20% original estimate
  • Benefit of feature doesn’t justify the cost
  • User testing shows feature isn’t valuable
  • Engineering says “this will take 3x longer than estimated”

Don’t say: “We’re behind schedule.” Say: “We’re learning that X is harder than expected. We can either:

  1. Extend timeline
  2. Cut scope (remove feature Y)
  3. Pivot approach

Here’s my recommendation…”

Risk Management in AI

Risk Classification

High risk (deal breakers):

  • Model accuracy doesn’t meet minimum threshold
  • Data isn’t available (can’t proceed)
  • Technical integration impossible
  • Legal/compliance shows it can’t ship

Medium risk (requires mitigation):

  • Accuracy is 80% but we need 90% (might be fixable)
  • Integration is complex (but possible)
  • User adoption slow (might improve with design)

Low risk (manageable):

  • Minor UX tweaks needed
  • API performance not optimal (fixable)
  • Team learns slowly (expected)

Risk Mitigation Strategies

For each high/medium risk:

  1. Identify the risk clearly
  2. Assess probability (likely? unlikely?)
  3. Assess impact (fatal? annoying?)
  4. Mitigation plan (what will we do about it?)
  5. Decision trigger (what would make us stop?)

Example Risk Register:

RiskProbabilityImpactMitigationTrigger
Model accuracy <80%MediumFatalSpike on data quality; try different modelIf accuracy <75% after spike, pivot
Integration takes 6+ weeksMediumMediumProof-of-concept first; allocate senior engIf PoC takes >2 weeks, reassess
Users don’t trust AIMediumHighA/B test with confidence scores; human oversightIf trust score <3/5, redesign

Velocity and Capacity Planning

Estimating with Uncertainty

Don’t estimate in “points.” Estimate in weeks of exploratory work:

Bad: “This story is 8 points” Good: “This will take 2 weeks to understand if it’s possible, then 4 weeks to build if it is”

Buffer for Learning

Add 30-50% buffer for learning and unexpected challenges:

Engineering estimate: 4 weeks With learning buffer: 5-6 weeks Communicated to stakeholders: “6-8 weeks”

This gives you room when things take longer and you look good if you finish early.

Capacity Planning

If team capacity is 20 weeks/quarter:

  • Allocate 15 weeks to committed work (70%)
  • Reserve 5 weeks for learning, spikes, unexpected issues (30%)

This prevents burnout and gives space for real exploration.

Celebrating Valuable Failures

A spike that teaches us something is a win, even if the answer is “this won’t work.”

Culture practice:

  • Publicize learnings from failed spikes
  • Celebrate the time saved by failing fast
  • Frame as “avoided 16 weeks of wasted effort”
  • Never shame teams for negative results

Example celebration:

“Great news: The spam detection approach wouldn’t work at our scale. Sarah and team figured this out in 2 weeks instead of 8. We’re now exploring approach B which looks more promising. Thanks for the rigorous work.”

Metrics for Uncertain Projects

Track different metrics than traditional projects:

Traditional metrics:

  • Tasks completed
  • On-time delivery
  • Budget vs. actual

AI project metrics:

  • Spikes completed and key questions answered
  • Experiments run and learnings captured
  • Technical debt incurred vs. managed
  • Team morale and learning

Strategic Questions

  1. What are our key uncertainties? Make them explicit.
  2. How will we resolve them? Spikes? Experiments? User research?
  3. What would make us pivot or stop? Know decision criteria upfront.
  4. How will we communicate with stakeholders? Set realistic expectations.
  5. How do we celebrate learning from failures? Create safe culture for exploration.

Key Takeaway: AI projects require different management because uncertainty is fundamental. Use spike-based planning to explore quickly. Run experiments to validate assumptions. Present three timelines and decision criteria upfront. Cut scope as you learn. Build culture that celebrates valuable failures. Reserve capacity for learning, not just delivery.

Discussion Prompt

For your next AI project: What are the 3 biggest uncertainties? How would you structure spikes to answer them? What would make you kill the project?