Case study · An ASX-100 financial services firm (claims operations)

Claims triage co-pilot — production in 3 weeks

A generative AI Pilot for a claims operations team. Operator-in-the-loop triage co-pilot deployed to production at the end of the 3-week sprint.

Quantified outcomes

Average triage time Outcome metric pending partner verification.
Operator satisfaction Outcome metric pending partner verification.
3 weeks Time to production From kickoff to operator-handed-over system in production — the published productised offer duration.

Outcome metrics for this engagement are pending partner sign-off and will publish here once verified.

The problem

What we were brought in to solve.

The claims operations team was triaging incoming claims manually, with a wait queue building during peak periods and an inconsistent specialist-assignment process that produced rework downstream. The team had previously evaluated a workflow-automation vendor solution but rejected it on the basis that it didn't handle the unstructured-document complexity of incoming claims (PDFs, photos, free-text narratives).

The approach

How the engagement ran.

A 3-week Generative AI Pilot, scoped to a single claims category. Week 1: integration with the existing claims-intake system and the evaluation harness against a labelled dataset of 200 past claims. Week 2: operator UI built into the existing claims console (not a separate tool), with explicit confidence scoring on every triage recommendation. Week 3: production deployment with shadow-mode running for the first week, then full handover to the operations team. The operator who would use the system sat in the design sessions from week 1; their input shaped the confidence-score threshold and the manual-override flow that turned out to be the most-used feature in production.

The outcome

What shipped and what happened.

The system reached production at the end of week 3 with the operator team running it unaided in week 4. The measured outcomes against the original Gate 2 success criteria all tracked: triage time fell substantially, operator satisfaction was high, and rework downstream — a softer metric we agreed to track at quarter-end — was on a clear downward trend. The client engaged us for an Operate retainer at the 60-day mark to extend the system to a second claims category.

Stack used

Anthropic Claude Vercel AI SDK Postgres Next.js AWS Bedrock (AU region)

Similar situation?

Book a discovery call about a similar project.

30 minutes, conversational, no commitment. We’ll come ready with questions specific to your industry and situation.