Kill Criteria: The Most Important Thing Missing From Your AI Roadmap

Every AI initiative has success criteria. Targets. KPIs. Projected ROI.

Almost none of them have kill criteria.

Kill criteria are the defined conditions under which you stop, roll back, and redesign. They are the most important — and most consistently absent — element of an AI roadmap.

Why kill criteria matter

Without kill criteria, a failing initiative doesn’t die. It lingers. The team reports “progress” because the system is technically running. Leadership doesn’t pull the plug because there’s no defined threshold for failure. The pilot becomes a permanent fixture: not good enough to scale, not bad enough to kill.

This is expensive. Not just in compute and engineering time, but in opportunity cost. It’s one of the main reasons AI pilots don’t scale. Every month spent maintaining a failing pilot is a month not spent on the initiative that would actually work.

Kill criteria turn a vague sense of “this isn’t working” into a concrete, pre-agreed decision point. If metric X doesn’t reach threshold Y by date Z, we stop. No debate. No sunk-cost reasoning. No “let’s give it one more quarter.”

What good kill criteria look like

A kill criterion has four components:

Metric — The specific, measurable quantity you’re tracking. Not “customer satisfaction” — that’s a concept. “Draft acceptance rate by Tier 1 agents” is a metric.

Threshold — The minimum acceptable value. This is not your target. The target is what you hope to achieve. The kill threshold is the floor below which the initiative is not viable.

Timeline — By when. “If draft acceptance rate doesn’t exceed 70% after six weeks of production use, we restrict the system to context gathering until quality stabilizes.”

Action — What happens when the criterion is triggered. Roll back entirely? Restrict scope? Redesign the agent? The action should be specific enough that the team can execute it without a meeting.

Setting thresholds

The hardest part is choosing the right threshold. Too aggressive and you kill initiatives that need time to mature. Too lenient and you let failing systems run indefinitely.

Three principles:

Start with the baseline. If your current misrouting rate is 18%, a kill threshold of 20% means the system must not make things worse. A target of 8% means you’re aiming for significant improvement. The threshold and the target serve different purposes.

Account for ramp time. Intelligence systems improve as they encounter more data and as human operators learn to work with them. A six-week evaluation window is typically appropriate for a first pilot. Two weeks is too short to draw conclusions. Six months is too long to wait if something is fundamentally wrong.

Make it a step function, not a slope. Don’t set a kill criterion based on trajectory (“improving week over week”). Set it based on an absolute threshold at a defined checkpoint. Trajectories are noisy. Thresholds are clear.

Kill criteria by pillar

Kill criteria aren’t just for KPIs. Each pillar of the operating model has conditions that should trigger a stop-and-redesign:

Pillar I — Workflow Architecture Kill criterion: “If the mapped workflow diverges materially from actual operations within the first two weeks of observation, pause and remap before proceeding to agent design.”

Translation: if your workflow map is wrong, everything built on it will be wrong. Stop early.

Pillar II — Agent & Tooling Design Kill criterion: “If human override rate exceeds 50% for medium-risk decisions after four weeks, the agent’s authority model needs redesign.”

Translation: if operators are rejecting half the system’s recommendations, the intelligence layer isn’t solving the right problem or isn’t solving it well enough.

Pillar III — Governance & Risk Control Kill criterion: “If audit logs reveal any instance of a high-risk decision executed without human approval, suspend the agent and audit the control gate implementation.”

Translation: governance failures are not gradual. A single boundary violation is a critical incident.

Pillar IV — KPI & Value Measurement Kill criterion: “If the initiative fails to demonstrate measurable improvement against baseline within the defined evaluation window, restrict scope or discontinue.”

Translation: the standard one. But it only works if you set the baseline first.

The organizational discipline

Kill criteria are easy to define and hard to follow.

When the kill threshold is hit, the sunk-cost instinct kicks in. The team has invested months. Leadership has communicated the initiative internally. No one wants to be the person who says “stop.”

This is why kill criteria must be pre-agreed. Before the initiative launches. In writing. With named decision-makers. The time to debate whether 70% acceptance rate is the right threshold is before the pilot starts — not six weeks in when the number is 65% and everyone has an opinion.

Pre-agreed kill criteria depersonalize the decision. The team isn’t “failing.” The initiative hit a defined boundary condition. The response is redesign, not blame.

Measurement infrastructure

Kill criteria don’t work without measurement infrastructure. You need:

Baselines measured before the initiative changes anything. Current state, documented and agreed upon.
Instrumentation that captures the relevant metrics in production. Not in a spreadsheet someone updates weekly — in the system itself.
Cadence for review. Weekly during a pilot. Monthly during steady-state operation. Quarterly for strategic review.
Named owners for every metric. Not “the team” — a person. Someone who reviews the number, interprets it, and raises the flag when it approaches the threshold.

This infrastructure isn’t overhead. It’s the mechanism that turns “we think it’s working” into “we know it’s working” — or “we know it’s not, and here’s what we’re doing about it.” For a deeper look at building this measurement framework, see The ROI Model Nobody Builds.

Start here

For your next AI initiative, add one thing to the roadmap: a kill criteria document. One page. For each intelligence layer you’re deploying, define the metric, the threshold, the timeline, and the action.

Then get it signed. Before you write a line of code.

The willingness to define conditions for stopping is the clearest signal that an organization is serious about building intelligence systems that actually work.