# Custom AI Insurance Development — Paiteq

> Paiteq does custom AI insurance development — claims, underwriting, FNOL triage, grounded chatbots — wrapped around your existing PAS/CMS/ACORD stack.

**HTML version:** https://www.paiteq.com/ai-for-insurance/

## Key facts

- Workflows: claims, underwriting, FNOL triage, grounded chatbots.
- Integrations: PAS, CMS, ACORD-aligned stacks.

## Related pages

- [AI Agent Development](https://www.paiteq.com/services/ai-agent-development/)
- [Chatbot Development](https://www.paiteq.com/services/chatbot-development/)
- [AI Consulting](https://www.paiteq.com/services/ai-consulting/)

## About Paiteq

Enterprise AI engineering — production agents, RAG, LLM apps, automation, generative AI. Eval-first, senior-led, fixed-scope engagements. Same-day reply from engineering. NDA counter-signed before discovery. Walk-away clause on every engagement.

**Site index for agents:** https://www.paiteq.com/llms.txt
**Full content for agents:** https://www.paiteq.com/llms-full.txt
**Book a call:** https://www.paiteq.com/contact/

---

## Full content

AI for Insurance · AI Insurance Development Company

# *AI for insurance* + custom AI insurance development, claims, underwriting governance, FNOL, and actuarial AI around your PAS/CMS stack.

Insurance leadership in 2026 sits in a three-way squeeze: combined-ratio pressure on every line as catastrophe-loss frequency keeps stair-stepping up, regulator scrutiny on every AI decision now that the NAIC AI Model Bulletin is moving through 20+ states' adoption pipelines, and a talent gap on actuarial plus claims that won't close in this hiring market. Paiteq is an AI insurance development company doing AI for insurance carriers, MGAs, brokers, and InsurTechs, we wrap your existing PAS (Guidewire, Duck Creek, Insurity), CMS (Snapsheet, Hi-Marley), and ACORD-data pipelines with AI decisioning across claims orchestration, underwriting model governance, FNOL triage, loss-run extraction, subrogation, and actuarial AI workflows. We're a custom AI insurance development partner that sits on top of Shift Technology, Gradient AI, and Roots.ai rather than competing with them. We stay through the first NAIC market-conduct review and the first eval-drift cycle, not the deploy.

[Talk to engineering](/contact/) [See the 7 use cases](#use-cases)

Use cases 7 · claims · UW gov · FNOL · actuarial · subrogation

Engage MVP · Platform · Enterprise

Stack Guidewire · Duck Creek · Claude · Pinecone · MLflow

Risk SOC 2 · NAIC AI Bulletin · MRM · ASOP 56

001 / WHY NOW

## Why insurance teams are evaluating AI for insurance from an AI insurance development company right now.

Chief claims officers, chief actuaries, and COOs at carriers and MGAs in 2026 face three pressures running in parallel: claims-leakage and LAE-ratio drift that no amount of adjuster headcount can fix, actuarial-governance load that absorbs senior talent every audit cycle even before a state regulator asks a question, and a service-call deflection gap that pushes AHT and E&O exposure simultaneously. Each pressure on its own would be manageable. Together, they're why ai in insurance industry conversations have moved from R&D experiment to operating-plan agenda since the 2023 NAIC AI Model Bulletin landed, and why every AI for insurance discussion we walk into now starts with a combined-ratio question rather than a tech question. The teams shipping insurance AI well aren't replacing Guidewire, Duck Creek, Insurity, Shift Technology, or Gradient AI, they're wrapping those primitives with an orchestration layer that makes them smarter, and that's the ai insurance development company shape every Round-3 zerovol SERP boutique now sells. The framing shift in 2026: AI in insurance industry stopped being a McKinsey slide and started being shipped code that tool-calls into the carrier's PAS.

0 –5d

FNOL → claim file assembled and routed

Each P&C claim file runs 18–40 documents (police reports, photos, repair estimates, prior policy, customer statements). Clerks spend 45–80 min assembling per file before adjuster review; leakage on missed coverage runs 3–8% of LAE.

0 –6w

Per-cycle actuarial governance time

Underwriting models drift quarterly but get reviewed annually. Each bias-test and disparate-impact pack runs two senior actuaries 3–6 weeks per audit cycle. State regulator questions land as 200-tab spreadsheets a month after the ask.

0 –14%

Subrogation recovery rate (industry baseline)

Mid-market P&C carriers recover on 7–14% of subrogation-eligible claims. Roughly 25–40% of recoverable opportunities never get worked because the subrogation review happens 60–120 days after claim close, not at FNOL.

The opinionated take

Most insurance AI projects fail because the team treats AI as a parallel system to the PAS instead of an orchestration layer inside it. A separate AI product that doesn't tool-call into ClaimCenter or PolicyCenter is a screen, not a decision. The cost of choosing the wrong abstraction layer is typically 5–12 months of rebuilding the integration scaffolding once the AI feature scales beyond a pilot, the team rewires the policy-forms ingestion, redoes the NPPI redaction layer, and almost always rebuilds the audit trail because the original one was bolted onto the wrong primitive and the chief actuary won't sign off on it. We don't get those numbers from theory; we've watched two carriers and one MGA do exactly this rebuild before engaging us.

— Paiteq engineering

002 / USE CASES

## The 7 highest-ROI AI use cases in insurance.

Below are the seven workflows we see insurance teams build first. They share three traits: each has a clear buyer-readable ROI number in insurance units (LAE points, combined-ratio movement, FNOL hours, severity-misclassification rate, subrogation recovery rate, underwriter time freed), each is deployable inside a 10–18 week window, and each compounds when you ship two or three together on a shared PAS/CMS integration layer rather than as standalone bets. The cards are dense on purpose, pain, with-AI workflow, named tools, and the ROI metric in the chief claims officer's or chief actuary's vocabulary. Skim them, then read the two or three that match where your roadmap actually sits today. The ai claims processing pattern below (UC-1) gets a deeper architecture treatment in a separate blog covering the FNOL-to-file agent loop.

USE CASE 01

### AI claims processing engine, FNOL to file-ready to routing

The Pain

AI claims processing is the workflow most carriers ask about first. From FNOL submission to claim file assembled and routed to the adjuster takes 2–5 business days on mid-market P&C carriers. Each file is 18–40 documents, police reports, photos, repair estimates, prior policy, customer statements, and claims clerks spend 45–80 min per file assembling before an adjuster ever reads it. Leakage from missed coverage details runs 3–8% of loss-adjustment expense. Most insurance AI vendors sell you an AI claims processing demo and a deck full of LAE-ratio screenshots; what they don't ship is the NAIC governance pack that lets your chief actuary sign off.

With AI

An agent reads the inbound FNOL (form submission, call transcript, email, photos, attached PDFs), classifies the claim type, pulls the matching policy plus prior claims plus relevant endorsements from your PAS, drafts a coverage analysis with a flag list of issues for the adjuster, and routes to the right segment queue. The adjuster approves or edits, no auto-decisions on coverage. The agent's job is the document assembly and the coverage read, not the liability call.

4–10 hrs

FNOL → file-ready (from 2–5 days)

Clerk time per file drops 45–80 min → 8–14 min; missed-coverage leakage drops 1.5–3 points; LAE ratio improves 2–4 points on mid-volume lines

Tools

Claude Sonnet 4.6GPT-4 VisionPineconeTurbopufferGuidewire ClaimCenterDuck Creek ClaimsSnapsheetHi-Marley

USE CASE 02

### Underwriting model risk and governance

The Pain

Underwriting models drift quarterly but most carriers review them annually. Bias-test artifacts get rebuilt manually by the actuarial team every audit cycle, 3–6 weeks of work per cycle. When the state regulator asks "show us the disparate-impact analysis for ZIP-code-correlated features", the answer is a 200-tab spreadsheet that took two senior actuaries a month to produce. AI underwriting without a continuous governance layer isn't AI underwriting; it's a model that'll get pulled at the next market-conduct exam.

With AI

A model-governance layer runs the bias tests, drift tests, and feature-stability tests continuously on every underwriting model in production. It surfaces a quarterly governance pack the chief actuary signs off and surfaces real-time alerts when a model crosses a drift threshold. The actuary stays the decision-maker; the platform produces the artifacts they need to sign off on. The base underwriting model stays classical ML (LightGBM or XGBoost) per ASOP No. 56, the governance wrap is what's new.

2–4d

actuarial governance per cycle (from 3–6w)

Drift-related rate-inadequacy events catch 60–90 days earlier; state-regulator response time on AI-model questions compresses from weeks to hours

Tools

MLflowLightGBMXGBoostFairlearnAequitasEvidentlyClaude Sonnet 4.6

USE CASE 03

### Grounded policy assistant for service and agent workflows

The Pain

Policyholder service calls average 6–11 minutes; agent service calls average 12–22 minutes. 40–60% of those calls are answered by a policy provision a grounded assistant could have read out loud, but generic conversational AI hallucinates coverage details, and a wrong coverage answer is an E&O exposure with measurable downstream cost. A policy assistant that hallucinates a deductible isn't an assistant, it's a litigation primer.

With AI

A grounded policy assistant reads ONLY from your actual policy forms (with endorsements applied), the active claim record, and the agent's authorization scope. It refuses to answer outside that scope. Every answer cites the policy section it came from. It surfaces "I'm not sure, let me get an adjuster" instead of guessing. It handles policyholder FAQs, claim-status, and routine agent endorsement requests; complex coverage interpretation routes to a human. The AI insurance agent variant reads the broker's book and the carrier's appetite simultaneously, surfacing appetite-fit and renewal flags for the ai for insurance agents workflow.

32–48%

deflection rate on policyholder calls

Agent service-call AHT drops 18–28%; every AI answer is policy-cited and logged for E&O defensibility

Tools

Claude Sonnet 4.6PineconeLangChainGuidewire PolicyCenterDuck Creek ProducerInsurity

USE CASE 04

### FNOL triage and severity prediction

The Pain

Severity classification at FNOL is mostly heuristic, adjusters segment claims by line of business and dollar threshold. Total-loss vs. repairable on auto runs 12–18% wrong at intake; coverage-litigation flags on liability claims get missed 8–15% of the time and surface later as bad-faith exposure. Mid-tier carriers don't lose money on the big claims they triage right; they lose it on the medium claims that should've been triaged big.

With AI

A triage model reads the FNOL inputs, description, photos, location, prior-claims history, policy coverage, and predicts severity tier plus total-loss probability plus an early litigation-risk score. High-severity or high-litigation-risk claims route to senior adjusters immediately instead of waiting 5–14 days for re-triage. The triage call isn't auto-binding; it's a queue-priority signal with a confidence score the adjuster can override.

3–6%

severity-misclassification rate (from 12–18%)

High-severity claims hit senior-adjuster desks within 2–4 hours instead of 5–14 days; bad-faith flags catch 70–85% of cases that would have surfaced later

Tools

XGBoostLightGBMMLflowClaude Sonnet 4.6GPT-4 VisionGuidewire ClaimCenterDuck Creek Claims

USE CASE 05

### Loss-run extraction from commercial submissions

The Pain

Commercial underwriters receive loss runs as PDFs, Excels, and emailed text, every submission in a different shape. Manual extraction takes 25–60 min per loss run; data-entry errors run 4–9%. Underwriters spend 35–50% of their day keying loss runs, not doing actual underwriting judgment. That ratio's the real reason MGAs can't scale a UW team without a four-month ramp.

With AI

An extraction pipeline reads the inbound loss-run document (any format), normalizes to ACORD or your internal schema, validates against the submission's effective dates and prior-policy history, and pushes the structured record into your underwriting workbench. The underwriter reviews the parse and spot-checks; the underwriter never re-keys. Low-confidence fields surface with the source-document snippet so the spot-check is 30 seconds, not 30 minutes.

~0.5%

data-entry error rate (from 4–9%)

Loss-run keying drops 25–60 min → 90 sec – 4 min per submission; underwriter time freed for actual risk judgment up 30–45%

Tools

Claude Sonnet 4.6Mistral OCRAWS TextractPineconeGuidewire UnderwritingPortalDuck Creek ProducerACORD

USE CASE 06

### Subrogation analyzer and recovery surfacing

The Pain

Subrogation recovery rate sits at 7–14% across mid-market P&C carriers. Recoverable claims get missed because the subrogation review happens 60–120 days after claim close, and adjusters don't flag liability shifts at FNOL. Industry estimate: 25–40% of recoverable opportunities never get worked. The dollars are sitting on the floor, the carrier just doesn't have an analyst who reads every file in time.

With AI

An analyzer reads every claim file in real time, flags the ones with subrogation indicators (third-party fault, contractual recovery rights, comparative-negligence shifts), drafts the demand-letter outline, and surfaces the package to the subrogation team within 7–10 days of claim opening, not 60–120 days after close. The team reviews and pursues; the agent is the eyes on every file, not the negotiator.

13–22%

subrogation recovery rate (from 7–14%)

Missed-recovery rate drops 25–40% → 8–15%; recovery dollars hit the books 90–180 days earlier on average

Tools

Claude Sonnet 4.6PineconeGuidewire ClaimCenterDuck Creek ClaimsBTI Solutions

USE CASE 07

### Agentic ops for renewals, endorsements, and quote prep

The Pain

Renewal prep on commercial books burns 4–9 hours per account per cycle. Underwriting assistants assemble the renewal package, loss runs, exposure changes, market data, prior endorsements, before the underwriter looks at it. Endorsement processing carries a 5–11 business-day backlog on mid-size MGAs. The agentic ai insurance pattern that works here isn't "replace the underwriter"; it's "stop paying an UW assistant to do data entry".

With AI

A renewal agent assembles the prep package autonomously (pulls prior policy, runs the loss-run extraction, summarizes exposure changes, drafts the renewal narrative), then routes to the underwriter for decision, not data entry. An endorsement agent handles routine endorsements start-to-finish with policyholder confirmation; complex endorsements surface for underwriter review. The agent runs against your underwriting guidelines via grounded retrieval, not against generic LLM judgment.

25–55 min

renewal prep per account (from 4–9 hrs)

Endorsement backlog drops 5–11d → same-day to 2d on routine endorsements; UW assistants reallocate to broker support instead of data assembly

Tools

LangGraphClaude Sonnet 4.6PineconeVerisk LightSpeedISO MarketStanceACORDGuidewire PolicyCenterDuck Creek Producer

A pattern worth flagging across all seven workflows above: **the ROI numbers are the median of what we and similarly-shaped boutiques have shipped on custom AI insurance development engagements**, not the headline outlier. Don't pick a use case for its ceiling. Pick the two with the cleanest buyer-readable ROI math for your operating model, personal-lines carriers with FNOL backlog start with UC-1 and UC-4; commercial MGAs with UW keying drag start with UC-5 and UC-7; specialty carriers with a recovery gap start with UC-6 and UC-1. Adjacent specializations, actuarial AI for rate-adequacy work, agentic AI insurance patterns for renewal-prep loops, AI for insurance agents workflows on the broker side, get their own deeper treatments; this pillar is the AI for insurance orchestration view, not the per-workflow deep-dive. The next section maps each pain to the Paiteq service pillar that does the actual engineering.

003 / SERVICE MAPPING

## How Paiteq services map to insurance needs.

Four common insurance pain shapes on the left, five Paiteq service pillars on the right. Hover any pain row to highlight which services we'd engage; hover a service to reverse-highlight the pains it solves. The descriptive anchors (not the service primary keyword) are deliberate, what matters to the chief claims officer or chief actuary is the workflow, not the service title.

Claims leakage and FNOL backlog

FNOL → file-ready runs 2–5 days; missed-coverage leakage costs 3–8% of LAE. Document assembly burns 45–80 min per file before an adjuster reads anything.

Underwriting model risk and actuarial governance

UW models drift quarterly, reviewed annually. Bias-test packs take 3–6 weeks per audit cycle. State-regulator AI questions land as month-long spreadsheet exercises.

Policyholder and agent service-call load

40–60% of service calls answer a policy provision; generic chatbots hallucinate coverage and create E&O exposure. AHT on agent calls runs 12–22 min.

Commercial underwriting data-entry drag

Underwriters spend 35–50% of their day keying loss runs and ACORD forms. Error rate runs 4–9%; MGA UW teams can't scale without a four-month ramp.

[

Service

AI Agent Development

orchestrating renewal, endorsement, and subrogation agents that tool-call against your PAS

](/services/ai-agent-development/)[

Service

RAG Development

grounded retrieval over policy forms, endorsements, and prior-claim history

](/services/rag-development/)[

Service

LLM Development

claims-document classification, loss-run extraction, and model selection across the insurance document stack

](/services/llm-development/)[

Service

Machine Learning Development

training underwriting, severity, and total-loss models on your historical book

](/services/machine-learning-development/)[

Service

MLOps

model governance, drift monitoring, and the ASOP No. 56 plus SR 11-7-adjacent MRM scaffolding your chief actuary signs off on

](/services/mlops/)

Why the map looks like this

Building insurance AI in 2026 is genuinely a multi-discipline engineering job, closer to platform integration plus regulated-ML work than to a typical PAS-customisation build. Claims leakage routes to three services because a working claims agent is partly [orchestrating renewal, endorsement, and subrogation agents that tool-call against your PAS](/services/ai-agent-development/), partly [grounded retrieval over policy forms, endorsements, and prior-claim history](/services/rag-development/), and partly [claims-document classification, loss-run extraction, and model selection across the insurance document stack](/services/llm-development/). Underwriting governance routes to ML plus MLOps plus LLM because a working UW model wrap isn't a single LLM call, it's a base classical model (LightGBM or XGBoost) under ASOP No. 56, a continuous bias-test layer (Fairlearn or Aequitas), drift monitoring (Evidently), and an LLM narrative layer that drafts the chief actuary's governance pack.

Policyholder-service load routes to RAG plus agent plus LLM because the grounded chatbot pattern is fundamentally a retrieval problem, the AI's job is to read the policyholder's actual policy forms with endorsements applied and refuse to answer outside that scope, not to make the coverage call. Commercial UW data-entry drag routes to LLM plus RAG plus agent because loss-run extraction at scale needs [training underwriting, severity, and total-loss models on your historical book](/services/machine-learning-development/) for the base extraction, ACORD-form RAG for normalisation, and agent orchestration for the submission-to-workbench push. And every one of these touches [model governance, drift monitoring, and the ASOP No. 56 plus SR 11-7-adjacent MRM scaffolding your chief actuary signs off on](/services/mlops/), MRM isn't a separate workstream, it's the spine the rest of the engagement hangs off. The discipline split isn't bureaucracy, it's how the engineering stays high-quality across a 24-week Platform build with claims, actuarial, compliance, and IT all watching the same use case land.

004 / RISK

## Operational risk and model governance for insurance.

Three risk layers shape every insurance AI engagement we run. SOC 2 Type II plus insurance-data posture is the B2B procurement gate, carriers, MGAs, and brokers won't let an AI vendor touch policy data, claim records, or NPPI without the attestation. The NAIC AI Model Bulletin (adopted 2023, now in 20+ states' adoption pipelines) plus state layers (Colorado SB 21-169 on algorithmic-discrimination in life UW; NY DFS Insurance Circular Letter No. 7 (2024) on third-party AI) sets the governance baseline. Model risk and actuarial governance under SR 11-7-adjacent MRM and ASOP No. 56 (Modeling, effective 2020, its scope broadly covers AI/ML predictive models used in actuarial work; the American Academy of Actuaries' 2024 AI Practice Notes operationalize it for ML) closes the loop with the chief actuary. The insurance buyer's procurement gate is SOC 2 plus NAIC plus MRM, not regulator-driven privacy in the SaaS sense.

SOC-2-ready practices · Continuous monitoring

-   SOC 2 Type II
    
    Insurance-data posture · per-tenant partitioning · NPPI redaction
    
    AUDITED · 2026
    
-   NAIC AI Model Bulletin
    
    5 AI Principles · Colorado SB 21-169 + NY DFS Circular Letter 7
    
    AUDITED · 2026
    
-   MRM + ASOP No. 56
    
    SR 11-7-adjacent · model cards · validation reports · champion-challenger
    
    AUDITED · 2026
    

Governance pack is the real gate, not the model choice

Every market-conduct exam and every reinsurance partner's questionnaire now asks how an AI system reached its decision and who reviewed it. AI-routed claims, AI-classified severity, AI-extracted loss runs, AI-drafted demand letters, each surfaces a reasoning trail plus confidence score plus the reviewing human's signature. The honest take: most insurance AI vendors skip the governance-pack conversation entirely because it's expensive engineering and a senior-actuary conversation they don't want to have, and their customers find out the hard way at the first state-regulator AI inquiry or the first internal MRM review. We don't. The model card, the validation report, and the champion-challenger setup are load-bearing, not optional add-ons.

SOC 2 TYPE II

Insurance-data posture

Carriers, MGAs, and brokers require SOC 2 attestation before any AI vendor touches policy data, claim records, or NPPI (non-public personal information, named insureds, SSN fragments, claim numbers tied to PII). Reinsurance partners ask the same question one layer up. We design AI features so insurance data never leaves your VPC: vector stores in Pinecone or Turbopuffer partition per insurer-tenant; embeddings never cross tenants; observability logs in Langfuse redact NPPI at the logging layer, not as a post-hoc scrub. The engagement signs DPA plus SOC 2 attestation review at kickoff. NAIC AI Model Bulletin §5 (third-party AI governance) applies, the carrier owns the AIS risk; our role is to design the controls that let them own it cleanly. Most carrier procurement teams we engage with already run a clean SOC 2 environment; our job is to design the AI scope so the attestation holds at next year's audit.

NAIC AI MODEL BULLETIN + STATE MOSAIC

AIS governance and state-level adoption

The NAIC Model Bulletin on the Use of AI Systems by Insurers (adopted 2023, now in 20+ states' adoption pipelines) sets the governance baseline. Colorado SB 21-169 (the first US state to enforce algorithmic-discrimination rules in life-insurance underwriting) and NY DFS Insurance Circular Letter No. 7 (2024) on third-party AI add layers. Every AIS we deploy carries a governance pack: intended purpose, training-data lineage, validation approach, bias-test results (Fairlearn or Aequitas), ongoing monitoring plan (Evidently), human-oversight protocol. The NAIC's five AI Principles, fair and ethical, accountable, compliant, transparent, and secure/safe/robust, map to specific engineering controls; they're not a checklist exercise. The framing matters: NAIC bulletin adoption is a state-by-state pipeline, not binding federal law everywhere, and the carrier's risk is the regulator they haven't met yet, not the regulator from last year.

MRM (SR 11-7-ADJACENT) + ASOP NO. 56

Model risk and actuarial governance

Insurers are extending banking-style Model Risk Management (the Fed's SR 11-7 framework, applied analogously rather than as a directly-binding rule) to AI/ML models in claims, underwriting, and pricing. The actuarial profession's ASOP No. 56 (Modeling, effective 2020) defines the chief actuary's responsibilities for model governance, validation, and ongoing review, and its scope already covers AI/ML predictive models used in actuarial work, operationalized by the American Academy of Actuaries' 2024 AI Practice Notes. Without MRM plus ASOP No. 56 alignment, your AI model can't be used in rate filings or reserves. Every production AI model carries a model card (intended use, limitations, performance bounds, known failure modes), a validation report, a drift-monitoring config (Evidently), a champion-challenger setup, and a documented human-override path. The chief actuary signs off; we build the artifacts they need to sign off on. Model risk isn't an afterthought, it's the architecture decision at week 3.

005 / ENGAGEMENT

## How an insurance AI engagement runs at Paiteq.

Five phases. Every phase has an explicit deliverable, a named owner inside your team, and a gate criterion that has to pass before the next phase starts. The cadence is weekly: a Monday standup with your Chief Claims Officer or Chief Underwriting Officer, the Chief Actuary, your Compliance lead, and your IT lead. Demo every Thursday. SOC 2 insurance-data posture, NAIC AI Model Bulletin governance-pack design, and the MRM scaffolding under ASOP No. 56 all track in parallel from week 1, not as a retrofit at the security review.

Insurance AI Engagement · 18 weeks (typical Platform tier) 5 phases

WEEK 1–2 Discovery

Use-case prioritisation, LAE-ratio or recovery-rate ROI scoping, stakeholder map (Chief Claims Officer + Chief Actuary + Compliance + IT)

Single buyer-readable ROI number scoped per use case (LAE pts, combined-ratio pts, FNOL hours, recovery-rate %)

WEEK 3–4 Architecture + MRM Scoping

Stack lock against your PAS/CMS/ACORD layer, SOC 2 insurance-data posture review, NAIC AI Model Bulletin governance-pack design, MRM scaffolding scoped

Architecture signed by your chief actuary and chief claims officer before any prompt is written

WEEK 5–10 MVP Build

Runnable agent against eval set plus your real claims or UW data, weekly demo, observability via Langfuse, NPPI redaction wired in, model cards drafted

Baseline accuracy hit on eval set; vector partitioning per tenant verified; reasoning-trail logging in place

WEEK 11–18 Production + Governance Pack

Hardening, fallback policies for PAS or vector-store outages, rollout, NAIC governance pack assembled, MRM validation report drafted with the chief actuary

All eval gates green; champion-challenger setup live; governance pack signed by chief actuary; market-conduct-exam-ready

WEEK 19+ Optimise + Handoff

Cost engineering, prompt iteration, runbook in your repo, drift-monitoring alerts wired to actuarial team, ownership transfer

Two cadence notes for insurance specifically

The chief actuary shows up week 1, not week 12. Half the use cases on this page, UC-2 UW model governance, UC-4 FNOL triage, UC-7 agentic renewals, depend on decisions that are genuinely actuarial decisions, not engineering ones (what the confidence threshold is for an auto-routed claim, how the bias-test pack maps to the carrier's rate filings, when the agent stops and waits for a human). We've found the first-week unblock is almost always getting the chief actuary into the architecture conversation before the model registry is locked, because changing the validation approach or the human-in-loop threshold at week 8 costs 3–5× what it costs to design it in at week 1. The second cadence note: the governance pack assembly lands at week 11–18, not after launch. The first NAIC AIS principles review, the first MRM validation report, and the chief actuary's sign-off are pre-launch gates, not post-launch deliverables. We've seen too many insurance AI vendors ship a working model that then sits unused for two quarters waiting on a governance pack the team never scoped.

006 / TEAM SHAPE

## Team shape for an insurance AI engagement.

Two engagement shapes cover roughly 80% of the insurance AI work we run across carriers, MGAs, and brokers. MVP for a single high-clarity use case with the PAS/CMS integration scaffolding sized accordingly; Platform for the multi-use-case build on shared infrastructure plus MRM scaffolding that most operators in the $100M–$1B written-premium band actually need. Enterprise tier (4 engineers, 3 ML engineers, 1 PM, 1 actuary liaison, 32+ weeks) sits behind these for org-wide AI orchestration across claims, underwriting, servicing, and MRM simultaneously. As an insurance AI development company we don't pretend the MVP shape is the right answer for everyone; it's a stepping stone for half our clients and a stop point for the other half.

MVP shape, one use case

Platform shape, 3–5 use cases + MRM scaffolding

Scope

One use case shipped to production (e.g. UC-4 FNOL triage or UC-5 loss-run extraction)

3–5 use cases on shared PAS/CMS integration layer plus MRM scaffolding

Team shape

2 eng + 1 ML + 0.5 PM

3 eng + 2 ML + 1 PM + 0.5 actuary-liaison

Timeline

10–14 weeks

18–28 weeks

Engagement shape

1 use case, 2 eng + 1 ML + 0.5 PM

3–5 use cases on shared PAS/CMS + MRM layer, 3 eng + 2 ML + 1 PM + 0.5 actuary liaison

Insurance MVP carries slightly heavier governance overhead than logistics because NAIC AI Model Bulletin documentation plus MRM scaffolding adds 2–4 weeks vs. a logistics build. **Platform tier is the median right answer** for mid-market carriers and MGAs in the $100M–$1B written-premium band. The Enterprise tier (4 eng + 3 ML + 1 PM + 1 actuary liaison, 32+ weeks) only fits when the engagement is genuinely org-wide AI orchestration across claims, underwriting, servicing, and MRM simultaneously. Specific engagement sizing comes out of the audit conversation.

Eval framework

Single eval set, 50–100 claims or submissions

Shared eval harness across use cases, regression alarms in CI on every model release, drift monitors wired to actuarial team

Observability

Langfuse traces + cost dashboard + NPPI redaction logging

Langfuse + per-use-case cost attribution + model-card registry + drift alerts routed to chief actuary

Stop-and-walk option

Yes, fixed scope, real option to stop after week 10

Phased gates at weeks 4 / 10 / 18; can collapse to single-use-case build mid-flight

Specific engagement sizing comes out of the audit conversation. Enterprise tier scoped separately on request.

Sizing for claims vs. UW vs. servicing workloads

FNOL triage (UC-4) and loss-run extraction (UC-5) tend to fit cleanly inside the MVP tier because the eval gate is narrow (severity-accuracy on a held-out claims set, extraction error rate on a sampled submissions set) and the integration surface is contained. Claims-processing engines (UC-1), grounded chatbots (UC-3), UW model governance (UC-2), and subrogation analyzers (UC-6) almost always need Platform tier because the eval harness, the governance-pack infrastructure, and the policy-forms corpus are the load-bearing pieces. We've seen more than one mid-market carrier under-scope a claims-agent build at MVP and lose 6–10 weeks rebuilding the policy-forms RAG layer mid-flight because the chief claims officer's coverage-read quality bar arrived sharper than expected.

The cheapest tier isn't the cheapest outcome

If you're shipping more than one AI use case in the next 12 months, and most insurance teams that get to a serious AI strategy will, the MVP tier asks you to rebuild the PAS/CMS integration layer, the eval framework, the NPPI redaction layer, and the MRM scaffolding twice. The second rebuild costs more than the first. Platform tier is the median right answer for mid-market carriers and MGAs in the $100M–$1B written-premium band because the shared infrastructure (eval harness, PAS adapters, RAG over policy forms and historical claims, model registry, governance-pack templates, observability via Langfuse) amortises across three to five use cases instead of one. We run MVP for two real cases: pre-scale operators testing whether insurance AI pays back at all, and specialty carriers with a single high-clarity workflow (usually loss-run extraction or subrogation surfacing) they want to ship in 12 weeks before greenlighting the platform investment. Both are legitimate; neither is most companies.

007 / WORK

## What we've shipped for insurance teams (anonymized).

Three anonymised insurance engagements from the broader team's history. Segment and written-premium band are real; metrics are real; the numbers were measured 90 days post-launch on the claims and recovery engagements and at the first audit cycle on the underwriting engagement, not at deploy. Brand names removed under standard NDA. Anyone selling you headline outliers without the operating numbers under them is selling case-study theatre.

Claims

Regional P&C carrier · $420M written premium · NA

### FNOL-to-file claims agent + coverage-read RAG

A P&C carrier with FNOL → file-ready sitting at 3.8 days mean and clerk time at 68 min per file. We shipped a claims agent (Claude Sonnet 4.6 plus GPT-4 Vision on damage photos and PDFs) with RAG over the policy-forms library and prior-claims corpus via Pinecone, tool-calling against Guidewire ClaimCenter for the case management. FNOL → file-ready compressed to 6.4 hours mean; clerk time dropped 68 min → 11 min; missed-coverage leakage dropped 2.1 LAE points; LAE ratio improved 3.4 points on auto-physical-damage lines inside two quarters.

FNOL → file-ready 3.8d → 6.4hr / 90d post-launch

Underwriting

Mid-market MGA · $180M written premium · commercial P&C · NA

### Loss-run extraction + UW model-governance wrap

An MGA with UW assistants spending ~42% of their day re-keying loss runs and bias-test prep eating 4.5 weeks per audit cycle. We shipped a loss-run extraction pipeline (Claude Sonnet 4.6 plus Mistral OCR, RAG over ACORD form definitions via Turbopuffer) and an MRM governance wrap (MLflow plus Fairlearn plus Evidently) on the underwriting models. Keying dropped to 2.1 min per submission at 0.4% error; actuarial governance compressed from 4.5 weeks to 2.8 days per cycle; the chief actuary signed off the first NAIC governance pack at week 18.

0 %

UW keying → 6% of day; gov pack 4.5w → 2.8d

Recovery

Specialty carrier · $90M written premium · liability lines · NA

### Subrogation analyzer + early-flag agent

Subrogation recovery rate sat at 9.1% with reviews happening ~85 days after claim close. We shipped a subrogation analyzer (Claude Sonnet 4.6 plus RAG over five years of historical subrogation files via Pinecone) that scored every open claim for liability-shift and third-party-fault indicators within 7 days of FNOL. Recovery rate climbed to 16.8%; missed-recovery dropped from ~32% to 11%; recovery dollars hit the books a mean 112 days earlier than baseline.

0 %

Subrogation recovery → 16.8%

The shape across all three engagements

The buyer-readable ROI metric was scoped in week 2, before any code was written, LAE-ratio target on the P&C carrier engagement, UW-time and governance-cycle target on the MGA engagement, recovery-rate target on the specialty engagement. The eval set grew during production via sampled traces, not a static set left over from architecture. The NAIC governance pack got signed by the chief actuary at week 18 or earlier on every engagement that needed one. Handoff put the runbook in the client's repo, not in a shared doc. We engage as a custom ai insurance development partner that stays through the first audit cycle and the first regulator AI inquiry, not one that ships and disappears. Roughly half of the insurance AI engagements we close convert to a lighter-weight Run engagement after the build is in production; half don't, because the client's internal team has picked up ownership. Both outcomes are fine. The Run engagement is real work, prompt iteration, cost engineering, drift retraining, governance-pack refresh on every model release, not a retainer hiding as a service.

FURTHER READING

## Where custom AI insurance development connects.

The dominant pattern in insurance is heavy [machine learning development services](/services/machine-learning-development/) (calibrated underwriting + claims-severity models with regulator-defensible feature importance) wired to a [RAG development services](/services/rag-development/) spine for policy-knowledge grounding. Adjudication and FNOL routing workflows ship as [AI automation agency](/services/ai-workflow-automation/) work, with [AI agent development company](/services/ai-agent-development/) patterns when the workflow has multi-step judgment.

The model serving and drift-monitoring layer lives in [MLOps services](/services/mlops/). Policyholder-facing servicing surfaces ship as [chatbot development services](/services/chatbot-development/). Strategic framing starts with [AI consulting services](/services/ai-consulting/); legacy-estate insurers typically need [AI migration services](/services/ai-migration/) first. Founder context: [Navin Sharma](/team/navin-sharma/); broader [AI development company](/).

008 / FAQ

## Insurance AI buyer FAQ.

Five questions we get on almost every insurance AI first call, answered the way we'd answer them on the call. Specific numbers, named tools, the actual decision rules, not generic vendor-deck answers.

How do you size an AI engagement on top of our PAS/CMS stack?

Three shapes. An **MVP build of a single AI use case**, FNOL triage, loss-run extraction, or a grounded chatbot, runs 10–14 weeks with 2 engineers, 1 ML engineer, and 0.5 PM. A **Platform build covering 3–5 use cases on a shared PAS/CMS integration layer plus MRM scaffolding** runs 18–28 weeks with 3 engineers, 2 ML engineers, 1 PM, and 0.5 actuary liaison. **Enterprise engagements** with org-wide AI orchestration across claims, underwriting, servicing, and MRM run 32+ weeks with 4 engineers, 3 ML engineers, 1 PM, and 1 actuary liaison. Insurance MVP carries slightly heavier governance overhead than logistics because the NAIC AI Model Bulletin governance pack plus model-risk scaffolding adds 2–4 weeks; every insurance AI development company that ships seriously now budgets that overhead from week 3, not as a retrofit at security review. Most insurance AI work that ships well sits in the Platform tier because the shared infrastructure (eval harness, model registry, PAS adapters, NPPI redaction, observability via Langfuse) amortises across the use cases instead of getting rebuilt every project. Specific sizing comes out of the audit conversation, [start there](/contact/).

Build vs. buy: when does in-house AI orchestration beat an InsurTech vendor (Shift Technology, Gradient AI, Roots.ai)?

Buy when the AI feature is genuinely commodity, basic claims-fraud scoring, off-the-shelf submission triage, generic chatbot, and the vendor's data and product surface already covers your line mix. Build the orchestration layer when AI touches your **differentiated claims handling, your underwriting decisioning, or your subrogation recovery**. FNOL-to-file claims agents with your specific coverage forms (UC-1), grounded chatbots tuned to your endorsements (UC-3), and subrogation analyzers reading your historical recovery patterns (UC-6) aren't workloads Shift Technology or Gradient AI will build for you, they sell platform breadth, not your specific decisioning. We've watched a $400M regional carrier buy two InsurTech tools in 18 months, realise neither composed into the chief claims officer's workflow, and rebuild the orchestration layer on top of both tools for less than the third tool's annual license. We design an [grounded retrieval layer over policy forms, endorsements, and prior-claim history](/services/rag-development/) that wraps the InsurTech you've licensed, not replaces it.

How do you handle NAIC AI Model Bulletin governance when an agent is making claims or underwriting decisions?

The first thing to be straight about: our agents never make binding coverage or underwriting decisions. They draft, route, flag, and assemble, humans approve. That alignment with the NAIC AI Principles (fair and ethical, accountable, compliant, transparent, and secure/safe/robust) that the Model Bulletin operationalizes is the architecture choice at week 3, not a documentation pass at week 14. Every AI system gets a governance pack: intended purpose, training-data lineage, validation approach, bias-test results (via Fairlearn or Aequitas), ongoing monitoring plan (Evidently), and a documented human-oversight protocol. State-level layers stack on top, Colorado SB 21-169 for life UW, NY DFS Insurance Circular Letter No. 7 (2024) for third-party AI, and the pack maps to each state's market-conduct-exam shape. The NAIC bulletin was adopted in 2023 and is moving through 20+ states' adoption pipelines; we document for the regulator the carrier hasn't met yet, not the regulator from last year. The opinionated take most insurance AI vendors skip: an AIS without a model card and a validation report isn't a compliance edge case, it's a market-conduct exam waiting to happen.

Which AI use cases have the highest ROI for a mid-market carrier or MGA ($100M–$1B written premium)?

The four highest-ROI starting points we see in 2026 are: **FNOL triage and severity prediction** (UC-4, severity-misclassification drops 12–18% → 3–6%, high-severity claims hit senior desks within 2–4 hours), **loss-run extraction** (UC-5, keying 25–60 min → 90 sec on high-confidence cases, error rate to ~0.5%), **FNOL-to-file claims processing** (UC-1, file-ready 2–5d → 4–10hr, LAE ratio up 2–4 points), and for carriers with a recovery gap, **subrogation analysis** (UC-6, recovery rate 7–14% → 13–22%). Pick the two with the cleanest buyer-readable ROI math for your operating model and let the eval data tell you which use case is next. Personal-lines carriers usually start with UC-1 and UC-4 because the claims leverage shows up inside two quarters; commercial MGAs start with UC-5 and UC-7 because the UW-assistant load is what's burning the team out. Trying to ship all four at once is how insurance AI engagements stall, too many actuarial and claims approvals running in parallel and the team loses the plot by week 16.

How long until we see combined-ratio or LAE-ratio improvement?

Honest answer: 12–18 weeks from kickoff for the first measurable LAE-ratio or recovery-rate lift on a single use case, and the lift compounds for another 2–3 quarters as the eval data tightens the agent's confidence thresholds and the actuarial team starts trusting the drift signals. The fastest single-use-case wins we've shipped: loss-run extraction at 10 weeks to first measurable underwriter-time delta; FNOL triage at 12 weeks to first severity-accuracy delta. The slower wins: claims-processing engines (UC-1) and subrogation analyzers (UC-6), which both need 14–18 weeks before the eval set covers enough line-of-business variability or recovery-pattern variability to trust the agent's outputs without heavy adjuster review. Combined-ratio movement on the whole book takes longer, typically 2–4 quarters of compounding LAE and leakage improvements before the financial statements show it. Anyone selling combined-ratio lift inside one quarter on a mid-market carrier hasn't actually shipped a claims agent against a real coverage corpus. [Model governance, drift monitoring, and the ASOP No. 56 plus SR 11-7-adjacent MRM scaffolding your chief actuary signs off on](/services/mlops/) in week 1 is where the timeline either gets real or stays fictional, and we'd rather scope conservatively and beat the timeline than promise a number that needs a board-level conversation in week 14.

009 / START AN INSURANCE AI ENGAGEMENT

## Book a discovery call. We'll name the *two AI features that'll move LAE ratio or recovery rate* and quote a build window.

No deck. Forty-five minutes with an engineering lead, your chief claims officer or chief actuary in the room, and a follow-up memo within 48 hours scoping the MVP or Platform tier sized to your line mix and written-premium band.

[Talk to engineering](/contact/) [See the 7 use cases again](#use-cases)

010 / OTHER INDUSTRIES

## Adjacent industries we engage.

Insurance sits next to three industries in our book where the AI build patterns rhyme, sometimes the workflow translates directly, sometimes the regulatory posture changes the engineering. Brief signposts; full pillars land as each ships.

[

INDUSTRY · FINTECH

AI for Fintech

KYC, fraud detection, model-risk governance under SR 11-7.

](/ai-for-fintech/)[

INDUSTRY · SAAS

AI for SaaS

Sales agents, RAG copilots, churn prediction, embedded product AI.

](/ai-for-saas/)[

INDUSTRY · LOGISTICS

AI for Logistics

Routing agents, shipment Q&A, claims triage, ETA prediction.

](/ai-for-logistics/)