AI Strategy & Roadmapping · Cornerstone
The numbers a CFO needs to see before approving AI investment — and how to keep them honest after the system ships.
Quantum Associates — Quantum Associates
· 12 min read
The hardest single conversation in any AI engagement we run is the one with the CFO. Not because CFOs are obstructive — they’re typically the clearest thinkers in the room — but because most AI business cases are built on the wrong unit economics. They optimise for headline savings that look impressive at the kickoff, then quietly collapse when the run cost shows up on the invoice in month four.
This piece is the framework we use to construct AI business cases that hold up to CFO scrutiny. It covers the four cost-and-value categories a defensible business case needs, the three measurement disciplines that keep the case honest after the system ships, and the most common errors we see in AI ROI calculations across the AU market.
Three patterns we see consistently:
Pattern A: only the build cost is modelled. The proposal includes a clear engagement cost ($X to build the system) and a forecast value ($Y per year in savings or revenue). It doesn’t include the run cost — the foundation model API charges, the additional infrastructure, the ongoing evaluation work, the operator time. When the run cost arrives in production, the actual margin collapses and the project goes from “saved us $400K/year” to “saved us $80K/year after costs.”
Pattern B: savings are gross, not net. “This system will save 4 FTE worth of work per year — that’s $400K in salary costs.” The assumption: those 4 FTE will be eliminated or redeployed to higher-value work, and the saving will show up on the P&L. The reality, almost always: the 4 FTE remain in their roles; the time saved is absorbed by other work that was previously deferred; the P&L impact is zero. A CFO sees through this in 10 seconds.
Pattern C: the comparison case is unrealistic. “Without AI, this work would take 40 hours per week of senior analyst time. With AI, it takes 6 hours.” The unspoken comparison is “AI vs do nothing.” The real comparison is usually “AI vs better process design, or AI vs hiring a junior analyst, or AI vs accepting the current state.” If those alternatives are cheaper, the AI ROI calculation needs to show the differential, not the gross.
The fix for all three is to construct the business case across four explicit categories with honest assumptions in each.
Straightforward. What you’ll pay for the engagement that designs and ships the system. Should be a fixed-price band, not an open-ended time-and-materials estimate. For our productised offers, this is the published price band.
Typical AU bands by engagement size:
The number should be a single line item with a 10–20% contingency for scope adjustments inside the agreed envelope.
What it costs to operate the system per month / per year. Includes:
For a moderately-sized production AI system in 2026, expect total run cost in the $3K–$20K per month range. Larger systems scale linearly to roughly $10K–$80K per month. The numbers move with usage; doubling the user base typically doubles the run cost.
The category most build-cost-only business cases miss entirely.
Total ongoing-people cost is typically $20K–$80K per year for a moderately complex production AI system. For larger systems with dedicated AI operations functions, this scales meaningfully.
The trickiest category. The four sub-categories that actually move CFO conversations:
a. Time saved → real P&L impact (rare but most credible)
The savings are real if and only if the time-saved capacity is actually removed or redeployed. Examples where this works:
The test: can a CFO point to a specific line item that gets smaller because of the AI system? If yes, the saving is real. If the answer is “the team will get more done in the same time” without a measurable downstream effect, the saving is theoretical.
b. Revenue impact
Often more credible than cost savings because revenue is observable. Examples:
Revenue cases need controlled measurement (ideally A/B testing, at minimum well-instrumented before/after analysis) to be credible past month three.
c. Risk reduction (hard to quantify, sometimes most important)
Hardest to put in dollars but sometimes the dominant value driver:
Risk-reduction value is real but should be modelled conservatively in the business case (apply a 0.3–0.7 probability discount to the headline avoided-loss number).
d. Capability that wasn’t previously available
The hardest to quantify but increasingly important in 2026: AI enables work that genuinely couldn’t be done before at any reasonable cost. Examples:
For these, the “without AI” comparison case is “we don’t do this at all” — the value is the strategic option created, not a cost avoided. Quantification is qualitative; presentation is in terms of strategic capability rather than P&L.
Putting the four categories together, the business case template that survives CFO scrutiny:
12-MONTH BUSINESS CASE — [System name]
COSTS
Build cost $X (one-time, year 1)
Run cost (12 months) $Y (foundation model + infra + obs)
Operational + people (12 months) $Z (eng maintenance + champion + ops)
Total year-1 cost $X+Y+Z
VALUE (year 1)
Direct cost savings (with line items) $A
Revenue impact (with measurement plan) $B
Risk reduction (probability-weighted) $C × 0.5
Capability premium (qualitative) [described, not totalled]
Total quantified year-1 value $A+$B+$C/2
PAYBACK
Year-1 net ($A+$B+$C/2) - ($X+$Y+$Z)
Months to payback Total cost / monthly net value
Year-2 net (no build cost) ($A+$B+$C/2) - ($Y+$Z)
3-year cumulative Year 1 + Year 2 + Year 3 net
The honest version usually shows: year-1 net is small or negative (build cost dominates), year-2 net is meaningfully positive, three-year cumulative is strongly positive. That’s how good AI investments actually look — they pay back in year two, not year one.
Business cases that show large year-1 positive returns are typically the ones where the numbers haven’t been pressure-tested. CFOs know this. They’re more comfortable with a realistic year-2 payback than an unrealistic year-1 positive.
A business case approved at month zero is half the work. The other half is maintaining the discipline to know whether the projected value is actually showing up. Three practices that distinguish AI investments that hit their projections from ones that quietly miss:
Before the AI system ships, the metrics it’s expected to move must have an explicit baseline. “Reduce average claim handling time” requires knowing the current average. “Improve conversion rate” requires the current conversion rate. Sounds obvious; routinely missed.
The reason: in the rush of pre-deployment, baselining feels like overhead. Six months later when someone asks “did this work?” the absence of the baseline makes the question unanswerable.
Practical: dedicate one workshop in the Design phase to defining the metrics, the baseline values, and the measurement methodology. Make the baseline a Design-phase deliverable.
The business case should be revisited at month 3, month 6, and month 12 post-deployment. Each review compares actual cost and actual value to the business case forecast, with explanations for variances.
This sounds like governance theatre but isn’t. The review forces the project sponsor to confront whether the system is actually delivering what was promised. If the answer is no, the response options are visible: tune the system, change the scope, accept lower-than-projected value, or shut the system down.
Practical: schedule the three reviews in the executive sponsor’s calendar at deployment. Make the actuals-vs-forecast a recurring agenda item.
Every AI investment should have an explicit kill criterion at decision time. “If this system doesn’t hit X by month Y, we shut it down.” The criterion forces honesty about what success requires; the existence of the criterion makes it possible to stop investments that are quietly underperforming.
The most common kill criteria we see (and recommend):
Practical: write the kill criterion into the engagement letter or the executive approval document. Refer to it explicitly at each actuals-vs-forecast review.
Every productised offer we run produces the four-category cost model as part of the Design phase output. The Generative AI Pilot, in particular, ships with both the build cost (the engagement cost) and the 12-month run cost forecast (with confidence bands).
For organisations that have AI investments already in flight and want a structured actuals-vs-forecast review, the AI Readiness Sprint includes review of existing AI investments as part of the prioritised backlog work. Sometimes the most useful output of that sprint is the recommendation to redesign or kill an existing investment that’s quietly underperforming.
For interactive estimation against your specific situation, our AI ROI Calculator walks through the four-category model and produces a structured business-case skeleton you can present to your finance team. Free, ~5 minutes.
For everything else, the discovery call is 30 minutes, no agenda. If you have a specific AI investment under consideration and want a senior practitioner’s view of the business case before you commit, that’s the fastest path.
Related insights
AI Strategy & Roadmapping
The pattern is consistent enough to be useful. What separates AI pilots that ship into production from the ones that quietly die six months in.
AI Strategy & Roadmapping
What AU buyers actually pay for AI consulting in 2026 — who's pricing what, where the bands are moving, and how to read a proposal.
Next step
30 minutes, no pitch, no deck — just a working conversation about how this applies to your situation.