8 min read
⏱ 8 min read
Start With the Problem, Not the Model

The highest-leverage phase of any AI project isn’t model selection or data preparation. It’s the thirty minutes someone spends deciding whether the problem is actually worth solving with AI. Many teams skip this or rush it; they often pay for it later in the form of projects that technically work but don’t get used.

A deployable problem statement typically has three properties: a measurable outcome, a defined success threshold, and a known cost of failure.
“We want to automate customer support” has none of these.
“We want to resolve 40% of Tier-1 support tickets without human intervention, with a false-negative rate below 5%, because each misrouted ticket costs us $12 in escalation time” has all three.
The difference isn’t semantic; it can determine whether you can ever declare the project done.
Solution-First vs. Problem-First

A useful diagnostic: try describing the problem without mentioning AI. If you can’t, you’re likely solution-first.
“We need an AI to classify documents” is a solution.
“We need to route 10,000 documents per day to the right team with fewer than 2% misclassifications, and the current manual process takes 3 FTEs” is a problem.
The second version might be solved with a rules engine, a better taxonomy, or a fine-tuned classifier; the first version has already closed off that conversation.
Stakeholder Alignment and the One-Page Brief
This is also where stakeholder alignment either happens or doesn’t. Who defines success? Does the engineering team’s definition match the business owner’s? Do they agree before work begins?
A one-page problem brief—not a project plan, just a single document capturing the outcome, the threshold, the failure cost, and the names of the people who’ve signed off—is often underused in AI project management. The AI project blueprint typically starts here, at the problem, not the model.
Assess Your Data Before You Touch a Model
Data assessment is where many projects encounter significant challenges. Not visibly, not dramatically; they often struggle in the slow accumulation of assumptions that turn out to be wrong six weeks in. Most project plans tend to underestimate this phase by a factor of two or three, and the consequences often show up as budget overruns and delayed launches that get attributed to “model complexity” when the real cause may have been skipping a risk-mapping step at the beginning.
Four Critical Questions
Four questions typically need answers before model selection begins, not in parallel with it:
- Do you have enough labeled data? “Enough” depends on the task; a text classifier generally needs thousands of examples, a fine-tuned foundation model typically needs hundreds, a custom vision model generally needs tens of thousands. The number matters because labeling time and cost are often among the longest poles in the tent.
- Is your training distribution likely to match your deployment environment? A fraud detection model trained on last year’s transaction patterns may drift when fraud patterns shift, and they often do.
- Who owns the data, and what are the compliance and access constraints? GDPR, HIPAA, contractual restrictions, and internal data governance policies have frequently delayed or derailed AI projects.
- What does data drift look like for this specific use case, and how will you detect it? If you can’t answer this before deployment, you may be operating with limited visibility after it.
The Cost of Deferred Data Cleaning
The “we’ll clean it as we go” approach often has a predictable cost profile. Teams that defer data cleaning to mid-project typically spend 30–40% of total project time on data work that was scoped at 10–15%. That’s often a scoping problem rather than a discipline problem. The fix is treating data assessment as a formal deliverable with its own timeline and sign-off.
Synthetic data and transfer learning are legitimate tools for closing data gaps, but they function as scoping tools rather than shortcuts. Synthetic data can augment a small labeled dataset; it may not substitute for understanding whether your real-world distribution is stable. Transfer learning can reduce the volume of labeled data you need; it typically doesn’t eliminate the need to validate that the pre-trained representations are appropriate for your domain.
Three Red Flags
Three signals that a project should pause rather than proceed:
- The data owner hasn’t confirmed access
- The training and deployment environments are materially different with no mitigation plan
- Nobody on the team has looked at a random sample of 200 records and described what they saw
The outputs of this phase typically become the risk register for everything downstream.
Build, Buy, or Fine-Tune: A Resource Decision, Not a Technical One
The build-versus-buy decision often gets framed as a technical question when it’s primarily a resource allocation question. The right answer typically depends on timeline, control requirements, and how much ongoing maintenance the team is willing to own; not solely on which approach produces the best benchmark numbers.
Three Practical Options
Three options exist in practice:
- Custom build from scratch
- API access to a third-party model
- Fine-tuning a foundation model on your own data
Data sensitivity and IP concerns often push toward build or fine-tune; if your data can’t leave your infrastructure, an external API typically isn’t an option. Speed to deployment and budget constraints often push toward buy; a well-designed API integration can reach production in weeks rather than months. Need for ongoing customization often pushes toward fine-tuning, which generally gives you more control than an API without the full cost of building from scratch.
Avoiding the Status Decision
A common pitfall is choosing “build” for status reasons. Engineering teams sometimes prefer custom builds because they’re more technically interesting; executives sometimes prefer them because they feel more proprietary. Neither is typically a sufficient reason.
The honest version of this decision starts with the deployment strategy: what does the operational model look like at scale, and which approach makes that model sustainable? The choice here directly shapes your deployment architecture, and it’s generally harder to reverse than it looks at the start.
Deployment Is Where Projects Become Real and Fragile
Deployment is where many AI guides stop paying attention, which is often problematic. It’s where the project becomes real and fragile simultaneously. AI deployment often fails differently than software deployment; a software bug is usually discrete and reproducible, while model degradation is typically gradual and statistical. You may not notice it until business metrics move.
Research from McKinsey on AI adoption identifies post-deployment monitoring gaps as a significant driver of value loss in production AI systems.
The Tiered Rollout Approach
A reliable structure for AI deployment often involves a tiered rollout:
- Shadow mode first, where the model runs alongside the existing process without affecting outcomes
- Limited rollout to a defined subset of traffic or users
- Full production
Skipping tiers is a common cause of high-profile AI failures. Shadow mode in particular is frequently underused; it lets you validate that the model’s behavior in production matches its behavior in evaluation before any real decisions depend on it.
Monitoring as a First-Class Deliverable
Monitoring should be a first-class deliverable, not an afterthought. Before you deploy, instrument three things:
- Model performance metrics (precision, recall, or whatever your task requires)
- Data distribution signals to detect drift
- Business outcome proxies that connect model behavior to the outcomes the project was designed to improve
If your model is making decisions and you’re only watching accuracy, you may be missing the signal that matters to the people who funded the project.
Human-in-the-Loop Before Incidents
The human-in-the-loop question should be answered before deployment, not after an incident. Where does the model make decisions autonomously? Where does it generate a recommendation that a human acts on? Where does it hand off entirely? These boundaries have legal, operational, and ethical implications; defining them post-incident is typically more expensive than defining them pre-deployment.
A deployment strategy isn’t a launch plan; it’s an operational model for how AI behavior will be monitored, maintained, and corrected over time.
Adapt Project Management to AI’s Specific Failure Modes
Standard project management frameworks often create friction on AI projects because they’re optimized for known unknowns. Agile sprints work well when you can decompose work into predictable units; AI projects often generate unknown unknowns, particularly around data quality and model behavior, that make velocity hard to forecast.
The answer isn’t to abandon structured project management; it’s to adapt it.
Capability-Based Milestones
Replace feature-based milestones with capability-based ones.
“Model training complete” is not a meaningful milestone; it tells you little about whether the project is on track.
“Model achieves 85% precision on held-out validation set” is a milestone; it gives you a clearer signal about whether to proceed or pivot.
Explicit Gates and Decision Points
Build explicit gates at three points:
- After data assessment
- After baseline model validation
- Before deployment
At each gate, the project either proceeds, pivots, or pauses; the gate exists so that decision can be made deliberately rather than by default.
Stakeholder Communication Strategy
Stakeholder communication is a distinct problem from technical communication. Decision-makers typically need progress framed in business outcomes; training loss curves and F1 scores are generally not progress updates for a VP.
A lightweight AI-specific project charter—one page, capturing the problem statement, success metrics, current data status, and deployment assumptions—can help reduce the translation overhead that often slows AI projects down.
Conclusion: The Blueprint Is a Set of Honest Questions
Following a blueprint doesn’t guarantee a successful AI project. The teams that ship working systems often aren’t the ones who executed a plan perfectly; they’re typically the ones who revisited their assumptions at each gate and made honest calls about whether to continue.
The problem statement that made sense at kickoff sometimes doesn’t survive contact with the data. The success threshold that satisfied the business owner at the start sometimes needs to be renegotiated when the deployment environment turns out to be more complex than expected.
Three Questions at Every Decision Point
Three questions are worth running against any AI project at every major decision point:
- Is the problem still worth solving the way we originally defined it?
- Is the data reality consistent with our assumptions when we scoped this?
- Does our deployment strategy account for how the model will behave when the real world diverges from our training distribution?
AI projects often encounter challenges at every earlier stage where a hard question wasn’t asked.
Enjoyed this artificial intelligence article?
Get practical insights like this delivered to your inbox.
Subscribe for Free