Managing AI Project Uncertainty

Why AI Projects Are Different

Traditional projects have known unknowns: you know what you’re building, roughly how long it takes, and what it will cost. You might be wrong, but you have a baseline.

AI projects have unknown unknowns. You don’t know if your approach will work until you try. The model might not be accurate enough. Integration might be harder than expected. The business might change its mind about what constitutes “good enough.”

This requires fundamentally different project management practices.

The Spike-Based Planning Approach

Most successful AI projects use spike-based planning: short exploration phases that answer key questions.

What’s a Spike?

A spike is a time-boxed exploration to answer a specific question. Typically 1-2 weeks.

Examples:

“Can we achieve 85% accuracy on email classification?” (1 week spike)
“How hard is integrating with our legacy system?” (1 week spike)
“Will customers trust AI-generated summaries?” (2 week spike with user testing)

Key characteristics:

Time-boxed: Never longer than 2 weeks
Specific question: “What we want to know” is clear
Prototype, not product: Not trying to build final system
Learnings, not code: Output is knowledge, not necessarily reusable code
Decision-gated: Spike ends with go/no-go decision

Structuring Spikes

Pre-spike checklist:

What are we trying to learn?
What would success look like? (concrete definition)
What would failure look like?
Who needs to approve continuation based on results?
What’s our contingency if this fails?

During spike:

Daily standup (15 min) — What did we learn? Any blockers?
Mid-spike checkpoint (day 4-5) — Are we on track? Do we need to pivot?
Rapid iteration — Don’t wait for perfection

End of spike:

30-minute readout to stakeholders
Clear findings and recommendation
Go/no-go decision

Example Spike Sequence

Spike 1: Is the data good enough? (Week 1)

Question: Do we have 5,000+ labeled examples for email classification?
Success: Historical data available, labeled with 90%+ confidence
Outcome: We have training data; move forward
Cost: 1 person-week

Spike 2: Does basic approach work? (Week 2)

Question: Using existing models and basic prompting, can we reach 80% accuracy?
Success: Test model achieves 80% on holdout set
Outcome: Accuracy is only 72%; need to pivot
Cost: 1.5 person-weeks

Pivot decision: Is 72% acceptable? No. Try different approach.

Spike 3: Can we improve accuracy? (Week 3)

Question: With few-shot prompting and custom fine-tuning, can we reach 85%?
Success: Achieve 86% accuracy
Outcome: Yes! Now feasible
Cost: 2 person-weeks

Spike 4: Can we integrate? (Week 4)

Question: How hard is API integration with email system?
Success: Proof-of-concept API call working from email system
Outcome: Straightforward; 3-week integration estimate
Cost: 1.5 person-weeks

Total spike investment: ~6 person-weeks → Answers 4 critical questions before committing to full build

Experiment-Driven Development

Once you’ve passed initial spikes, use experiment-driven development.

Structuring Experiments

Goal: Answer specific questions about AI quality, user adoption, or business impact.

Example experiment:

Hypothesis: If we show confidence scores with AI-generated summaries, users will trust them more
Test: A/B test: Control (summary only), Test (summary + confidence score)
Duration: 2 weeks with 1,000 users
Success metric: Trust rating increases 10%+
Outcome: Determines whether feature ships with confidence scores

Running Experiments Safely

Start with small audience: 10% of users, then scale
Monitor continuously: If error rate > threshold, stop
Have kill switch: Can immediately disable if problems occur
Clear acceptance criteria: If X metric < Y, we stop

Example acceptance criteria:

Error rate < 5% (if > 5%, stop and debug)
User satisfaction stays ≥ 4.0 (if drops, stop)
System latency < 3 seconds (if slower, debug)

Stakeholder Management with Uncertainty

The challenge: executives want timelines; you’re uncertain.

How to Present Uncertainty

Bad approach:

“We don’t know how long this will take.”

Good approach:

“We’ve done a 2-week spike on feasibility. Here’s what we learned:

We can achieve 85% accuracy (good enough for launch)

Integration will take 4 weeks

Main risk is user adoption; we’ll A/B test before scaling

Our timeline: 2 week spike (done) → 4 week build → 2 week A/B test → Launch Total: 8 weeks if all goes well. If any spike fails, we’ll reassess.”

Three Timelines

Present three versions:

Optimistic: Everything works as expected

Spike results are good
Integration is straightforward
Users adopt quickly
Example: 8 weeks

Expected: Some issues, normal rework

One technical blocker requiring rework
User testing reveals UX issue requiring fix
One week of debugging
Example: 12 weeks

Pessimistic: Significant challenges

Accuracy harder than expected; need more data or fine-tuning
Integration reveals unexpected dependency
Users don’t trust feature; requires redesign
Example: 20 weeks (or decision to kill)

Recommendation:

“We’re planning for the 12-week timeline. If results beat expectations, we ship earlier. If challenges emerge, we’ll have honest conversation by week 6.”

Setting Expectations Upfront

Before starting:

“This is uncertain; we’ll learn as we go”
“We’ll make go/no-go decisions at week 2, 4, 6”
“If initial results aren’t promising, we’ll pivot or stop”
“Success means learning, not necessarily shipping”

This prevents surprise disappointment.

Managing Scope with Uncertainty

Problem: Scope creep kills AI projects because the actual work is hard.

MVP Thinking for AI

Define minimum viable version:

What’s the smallest thing that creates value?
What can we do in the baseline timeline?
What’s nice-to-have we can cut?

Example: Support Automation MVP

Full vision: AI handles 80% of all support tickets MVP: AI drafts responses to 5 common question types, humans review and send

Full vision: 16 weeks, risky
MVP: 6 weeks, valuable, proves approach

Launch with MVP, expand after validation.

Cutting Scope as You Learn

As you work, you’ll learn that some things are harder than expected.

When to cut scope:

You’re hitting complexity that delays beyond 20% original estimate
Benefit of feature doesn’t justify the cost
User testing shows feature isn’t valuable
Engineering says “this will take 3x longer than estimated”

Don’t say: “We’re behind schedule.” Say: “We’re learning that X is harder than expected. We can either:

Extend timeline
Cut scope (remove feature Y)
Pivot approach

Here’s my recommendation…”

Risk Management in AI

Risk Classification

High risk (deal breakers):

Model accuracy doesn’t meet minimum threshold
Data isn’t available (can’t proceed)
Technical integration impossible
Legal/compliance shows it can’t ship

Medium risk (requires mitigation):

Accuracy is 80% but we need 90% (might be fixable)
Integration is complex (but possible)
User adoption slow (might improve with design)

Low risk (manageable):

Minor UX tweaks needed
API performance not optimal (fixable)
Team learns slowly (expected)

Risk Mitigation Strategies

For each high/medium risk:

Identify the risk clearly
Assess probability (likely? unlikely?)
Assess impact (fatal? annoying?)
Mitigation plan (what will we do about it?)
Decision trigger (what would make us stop?)

Example Risk Register:

Risk	Probability	Impact	Mitigation	Trigger
Model accuracy <80%	Medium	Fatal	Spike on data quality; try different model	If accuracy <75% after spike, pivot
Integration takes 6+ weeks	Medium	Medium	Proof-of-concept first; allocate senior eng	If PoC takes >2 weeks, reassess
Users don’t trust AI	Medium	High	A/B test with confidence scores; human oversight	If trust score <3/5, redesign

Velocity and Capacity Planning

Estimating with Uncertainty

Don’t estimate in “points.” Estimate in weeks of exploratory work:

Bad: “This story is 8 points” Good: “This will take 2 weeks to understand if it’s possible, then 4 weeks to build if it is”

Buffer for Learning

Add 30-50% buffer for learning and unexpected challenges:

Engineering estimate: 4 weeks With learning buffer: 5-6 weeks Communicated to stakeholders: “6-8 weeks”

This gives you room when things take longer and you look good if you finish early.

Capacity Planning

If team capacity is 20 weeks/quarter:

Allocate 15 weeks to committed work (70%)
Reserve 5 weeks for learning, spikes, unexpected issues (30%)

This prevents burnout and gives space for real exploration.

Celebrating Valuable Failures

A spike that teaches us something is a win, even if the answer is “this won’t work.”

Culture practice:

Publicize learnings from failed spikes
Celebrate the time saved by failing fast
Frame as “avoided 16 weeks of wasted effort”
Never shame teams for negative results

Example celebration:

“Great news: The spam detection approach wouldn’t work at our scale. Sarah and team figured this out in 2 weeks instead of 8. We’re now exploring approach B which looks more promising. Thanks for the rigorous work.”

Metrics for Uncertain Projects

Track different metrics than traditional projects:

Traditional metrics:

Tasks completed
On-time delivery
Budget vs. actual

AI project metrics:

Spikes completed and key questions answered
Experiments run and learnings captured
Technical debt incurred vs. managed
Team morale and learning

Strategic Questions

What are our key uncertainties? Make them explicit.
How will we resolve them? Spikes? Experiments? User research?
What would make us pivot or stop? Know decision criteria upfront.
How will we communicate with stakeholders? Set realistic expectations.
How do we celebrate learning from failures? Create safe culture for exploration.

Key Takeaway: AI projects require different management because uncertainty is fundamental. Use spike-based planning to explore quickly. Run experiments to validate assumptions. Present three timelines and decision criteria upfront. Cut scope as you learn. Build culture that celebrates valuable failures. Reserve capacity for learning, not just delivery.

Discussion Prompt

For your next AI project: What are the 3 biggest uncertainties? How would you structure spikes to answer them? What would make you kill the project?