# Case Studies — Paiteq

> Anonymized featured AI engineering work — agents, RAG systems, and intelligent automation shipped into production. Client names withheld at the client's request; named references available under NDA during discovery.

**HTML version:** https://www.paiteq.com/case-studies/

## Key facts

- Production engagements across ecommerce, fintech, healthcare, insurance, logistics, and SaaS.
- Anonymized by default. Named references shared under NDA during the discovery call.

## Related pages

- [Services hub](https://www.paiteq.com/services/)
- [Contact](https://www.paiteq.com/contact/)

## About Paiteq

Enterprise AI engineering — production agents, RAG, LLM apps, automation, generative AI. Eval-first, senior-led, fixed-scope engagements. Same-day reply from engineering. NDA counter-signed before discovery. Walk-away clause on every engagement.

**Site index for agents:** https://www.paiteq.com/llms.txt
**Full content for agents:** https://www.paiteq.com/llms-full.txt
**Book a call:** https://www.paiteq.com/contact/

---

## Full content

Work

# AI case studies. *Anonymized* featured work.

Industry and segment are real; outcomes are real; brand names removed under standard NDA terms. Deep case studies land here as engagements close out and clients permit attribution.

001 / FEATURED

## Three engagements, three patterns.

Each card below is a real engagement. These AI case studies are scoped tight on purpose: the function tells you the workload shape; the segment tells you the size; the outcome is the metric the client signed off on. Deeper AI case studies land here as clients permit attribution.

Sales

B2B SaaS · 11 to 50 emp

### Lead-qualification + outbound research agent

Pulls signals from LinkedIn, Crunchbase, the prospect's website, and recent news. Scores fit against ICP, drafts personalised first-touch, escalates only above threshold. Built on a multi-agent loop with a researcher, a scorer, and a writer, each with bounded tool access. Human reviews accept, edit, or reject; rejections feed a weekly prompt-eval cycle.

0

SDR seats (2026-Q1)

Support

Health-tech · enterprise

### Tier-1 deflection agent

RAG over product docs and an 18-month ticket archive. Resolves password, billing, and onboarding without human touch. Clinical questions escalate with full context. We grounded every answer in retrieved snippets, blocked any response below a 0.72 retrieval-score floor, and logged every interaction for a weekly Ragas eval the support lead signs off.

0 %

p1 ticket volume in 90 days (2026-Q1)

Ops

Mfg · 200+ emp

### Invoice matching + AP routing agent

OCR plus LLM extraction on PDF and scanned invoices. Matches against open POs in NetSuite, routes to approver via Slack. Exceptions go to the ops lead with an annotated diff. The extraction model was fine-tuned on 4,200 historic invoices; the matcher is a deterministic rules layer the auditor can read line-by-line. No black-box decisions in the AP path.

0

ROI inside 6 months (-Q1)

002 / HOW WE BUILD

## The shape of every engagement.

Different workloads, same delivery shape. We start with a discovery audit, ship a 4 to 6 week pilot with weekly evaluation gates, and only continue if the metrics warrant it. Every contract carries a walk-away clause; we'd rather lose the engagement than ship something that doesn't move a number.

### Discovery audit

Two weeks. We read your data, sit with the team that owns the workflow, and write a brief that names the model, the eval, the failure modes, and the cost envelope. The output is a go or no-go recommendation, in plain English. If we'd build it differently in-house than as a vendor, we tell you. The AI agent case studies on this page all started in this phase as a one-page brief; the production AI case studies are the same audits, six months later.

### Pilot with weekly eval gates

Four to six weeks. A working agent on a real slice of production data, behind a feature flag, with metrics visible to your team in a shared dashboard. We pick the eval framework up front (Ragas for retrieval, deterministic regression sets for routing, human-grading for tone) and review every Friday. If the eval doesn't move week-over-week, we stop. That's the walk-away clause in practice.

### Continuous delivery

Ongoing once the pilot passes. We own a piece of your roadmap, ship weekly, and roll back the same day if a metric regresses. Most engagements settle into 1 to 2 ship windows per week with a 24-hour rollback budget; the sales agent has been on this cadence since 2026-Q1 with zero production rollbacks logged.

### Where each engagement sits on the practice

The sales agent is a clean [AI agent development](/services/ai-agent-development/) engagement: multi-agent orchestration, bounded tools, human-in-the-loop edit cycle. The support deflection agent leans on [RAG development services](/services/rag-development/) for grounded retrieval plus [chatbot development services](/services/chatbot-development/) for the conversational surface. The AP routing agent is [AI workflow automation](/services/ai-workflow-automation/) on top of OCR; the LLM is one node in a larger deterministic pipeline, not the whole system. Each pattern is documented at its pillar page with the slot counts, the dated benchmarks, and the decision tree we use to scope it. The AI case studies we publish here all sit inside one of those four pillars; the audit conversation is the place to figure out which pillar your workload actually maps to.

Start a project

## Want a *case study* of your own?

Pilot in 2 to 4 weeks. Production build in 8 to 16. Same-day response on every inbound.

[Talk to engineering](/contact/) [AI agent development](/services/ai-agent-development/)
