Generative AI Implementation · Cornerstone
When to use an off-the-shelf AI product, when to build your own integration, when to fine-tune — with the trade-offs that matter at AU mid-market scale.
Quantum Associates — Quantum Associates
· 10 min read
The single biggest source of wasted AI spend in the AU mid-market in 2026 is the decision to build when buying would have been better — closely followed by the decision to fine-tune when prompting would have been enough. Both errors are made by smart people, usually because the trade-offs aren’t legible until you’ve been through the work once.
This piece is the decision framework we walk through with clients in the first week of any AI engagement. It covers the three implementation patterns (buy / build / fine-tune), the questions that point cleanly to one or another, and the cost-and-effort reality of each at mid-market AU scale in 2026.
Buy means using an off-the-shelf AI product — Copilot for Microsoft 365, Einstein for Salesforce, a vertical SaaS tool with AI features built in, a commercial RAG product like Glean. You’re paying a per-seat or per-call fee and accepting that the product’s opinions about how the work should be done are mostly fixed.
Build means assembling your own system from foundation-model APIs and your own data. You’re typically writing prompts, designing a retrieval pipeline, integrating with your data sources, and shipping the result as a custom application. The model itself is usually frontier-tier and unmodified — the value you add is the integration and the prompting.
Fine-tune means taking an existing foundation model and training it on your own data to change its behaviour. You’re paying compute costs to update the model weights and accepting the operational complexity of running a model that’s yours, not the vendor’s.
These are three different bets, with different cost profiles, time-to-value, and risk profiles. The right one depends on the use case — and most organisations need a mix.
The default in 2026 should be buy unless you have a specific reason to choose otherwise. Three reasons it’s the right default:
The exceptions to the buy-first default are specific:
If none of those apply, buy. The longer answer to “why most organisations should buy first” sits inside our Why most enterprise AI pilots fail piece — most pilot failures we see are custom builds that should have been off-the-shelf.
Build is right when at least two of the four exception criteria above apply. In practice that’s typically:
The build pattern that’s working in 2026 is what we’d call thin-wrapper-on-frontier-model:
Build cost in AU mid-market for a focused build in this pattern is typically:
Total year-1 envelope: typically $250K–$700K for a single production workflow at mid-market scale. Year-2 cost drops significantly as build is amortised.
If those numbers don’t pencil against the value of the use case, the right call is usually buy, not “build at a smaller budget” — the failure mode of under-budgeted builds is what funds most of the AU AI consulting industry’s rework engagements.
Fine-tuning is the most over-prescribed pattern in 2026 AU AI consulting. The pitch is intuitive — “we’ll train a model on your data so it understands your business” — but the reality is that fine-tuning is rarely the right answer for the problem the buyer is actually trying to solve.
Fine-tuning is the right answer when:
Fine-tuning is the wrong answer when:
The cost-and-effort reality of fine-tuning:
The cleaner alternative for most “we want the model to behave specifically” cases: invest the same money in better prompts, better retrieval, and better evaluation — you’ll typically get 80% of the benefit at 20% of the cost, and the work is portable across model upgrades.
Walk through the following in order. Stop at the first “yes.”
Q1. Is there a credible off-the-shelf SaaS product that solves 80% of this use case? → Yes: buy it. Then evaluate after 6 months whether the remaining 20% justifies a custom build or whether the SaaS vendor has closed the gap.
Q2. Is the use case core to your differentiation, compliance-restricted in ways SaaS can’t serve, structurally inaccessible to SaaS connectors, or at a scale where the unit economics don’t work? → Yes: build, using the thin-wrapper-on-frontier-model pattern. → No: revisit Q1 — you may have been too restrictive about what SaaS options were available.
Q3. After building, have you exhausted prompting and retrieval improvements and still have a measurable behavioural gap? → Yes: consider fine-tuning, with a clear specification of what you’re trying to achieve. → No: keep iterating on prompts and retrieval; the marginal cost is lower.
The most common error is jumping from Q1 to Q3 — “we want a model that understands our business, let’s fine-tune one.” That conflates “the model needs access to our data” (a retrieval problem) with “the model needs to behave differently” (a fine-tuning problem). They’re different problems with different solutions.
A mid-market AU insurer asked us to scope an AI engagement to “fine-tune a model on our claims handling guidelines so adjusters can ask it questions during triage.”
We walked through the framework:
That’s the canonical pattern. The first-instinct answer was fine-tune. The right answer was thin-wrapper build. The work it took to figure that out was three days of structured discovery — a fraction of the cost of starting on the wrong path.
In 2026 AU mid-market:
If you’re mid-decision: the AI Readiness Sprint is the productised engagement we run to walk through this framework against your specific use-case backlog. Three weeks, fixed price, with the explicit deliverable of a scored decision per workflow.
Related insights
Generative AI Implementation
A decision framework for when retrieval-augmented generation earns its keep, when fine-tuning is the right answer, and when prompting alone is enough. With the cost-and-latency reality of each in 2026.
AI Strategy & Roadmapping
The pattern is consistent enough to be useful. What separates AI pilots that ship into production from the ones that quietly die six months in.
Next step
30 minutes, no pitch, no deck — just a working conversation about how this applies to your situation.