The ROI Model Nobody Builds
Most AI initiatives can't prove value because they never defined what value means. Baselines, targets, and kill criteria aren't optional — they're the difference between scaling and stalling.
Ask any company with an AI pilot in production: is it working?
You’ll hear “the team likes it” or “it seems faster” or “we’re still evaluating.” What you won’t hear is a number. A baseline. A target. A defensible answer.
This is the measurement gap, and it kills more AI initiatives than bad models ever will.
Why measurement gets skipped
It’s not that teams don’t care about ROI. It’s that measurement is hard, it slows things down, and the incentive structure rewards launching over proving.
The pilot team wants to ship. The executive sponsor wants to announce progress. The vendor wants a case study. Nobody wants to spend two weeks establishing baselines before the exciting part starts. So measurement gets deferred to “after launch” — which means never, because after launch there are no baselines to compare against. This is why pilots stall.
The result is an initiative that can’t answer the only question leadership actually cares about: should we keep investing in this?
What an ROI model actually requires
A proper measurement model for an AI initiative has five components. None of them are optional.
Baselines. The current state, measured before the system touches anything. Time-to-first-response: 6.2 hours. Misrouting rate: 18%. Cost per ticket resolution: $14.30. These aren’t estimates or averages from last quarter’s dashboard. They’re precise measurements taken during the baseline window, using the same methodology you’ll use to measure improvement.
Targets. Specific, time-bound outcomes that define success. Time-to-first-response: 2.5 hours within 8 weeks. Misrouting rate: below 10% within 6 weeks. Targets are commitments, not aspirations. They’re grounded in the workflow analysis and the agent design — you can explain why this target is achievable, not just why it would be nice.
Owners. A named person responsible for each metric, its measurement cadence, and its reporting. Not “the data team.” A person with a name who reviews the scorecard weekly and escalates when metrics drift. Measurement without ownership is data collection. Data collection without accountability is noise.
Cadence. How often each metric is reviewed, who reviews it, and what decisions are made based on the review. Weekly for operational metrics. Monthly for economic impact. Quarterly for strategic assessment. The cadence creates the feedback loop that turns measurement into governance.
Kill criteria. The conditions under which you stop, redesign, or roll back. If draft acceptance rate doesn’t exceed 70% after six weeks, restrict the agent to context gathering until quality stabilizes. If cost-per-resolution increases by more than 5%, pause the deployment and investigate.
Kill criteria are the most important and most neglected component. Without them, underperforming initiatives persist indefinitely because nobody defined what failure looks like.
Translating KPIs to economics
Operational metrics matter. Economic translation is what gets budget approval.
The translation follows a simple structure:
- Hours saved → headcount capacity recovered → dollar value at fully loaded cost
- Error reduction → rework eliminated → cost avoided per incident × volume
- Speed improvement → faster resolution → customer satisfaction → retention impact
- Risk reduction → incidents prevented → cost per incident × probability
The key is connecting operational improvement to a metric the CFO recognizes. “Misrouting rate dropped from 18% to 8%” is good. “That eliminates 420 rework cycles per month at $22 each, recovering $9,240 in monthly cost” is what gets the next phase funded.
The scorecard
The measurement model produces a living artifact: the executive scorecard. One page. Updated at cadence. Shows:
- Each KPI with baseline, current, and target
- Trend direction and rate of change
- Economic translation of current performance
- Risk flags for any metric trending in the wrong direction
- Kill criteria status: are we above or below the threshold?
The scorecard is not a dashboard buried in a BI tool. It’s a decision document that goes to the executive sponsor at the agreed cadence. It answers the question: is this working, and should we expand, adjust, or stop?
Build measurement before you build the system
The instinct is to measure after deployment. Reverse that.
Establish baselines during the workflow mapping phase. Define targets during agent design. Build the scorecard framework during governance design. By the time the system goes live, measurement is already running — and you have a clean baseline window to compare against.
This is Pillar IV of the Enterprise Intelligence Architecture, and it depends on everything that came before. You can’t set meaningful targets without understanding the workflow. You can’t define kill criteria without knowing the risk profile. You can’t translate to economics without the baseline data.
The question that matters
When the board asks “is our AI investment working,” you need an answer that sounds like this:
“Time-to-first-response improved from 6.2 hours to 2.8 hours, exceeding our 3.0-hour interim target. Draft acceptance is at 74%, approaching the 80% target. Misrouting dropped to 9%, within the target band. Economic impact: $11,400 monthly cost recovery with a path to $18,000 at full deployment. No kill criteria triggered. Recommending Phase 2 expansion.”
That answer comes from building the ROI model before you build the system. There are no shortcuts.