Work

AI case studies. Anonymized featured work.

Industry and segment are real; outcomes are real; brand names removed under standard NDA terms. Deep case studies land here as engagements close out and clients permit attribution.

001 / FEATURED

Three engagements, three patterns.

Each card below is a real engagement. These AI case studies are scoped tight on purpose: the function tells you the workload shape; the segment tells you the size; the outcome is the metric the client signed off on. Deeper AI case studies land here as clients permit attribution.

Sales
B2B SaaS · 11 to 50 emp

Lead-qualification + outbound research agent

Pulls signals from LinkedIn, Crunchbase, the prospect's website, and recent news. Scores fit against ICP, drafts personalised first-touch, escalates only above threshold. Built on a multi-agent loop with a researcher, a scorer, and a writer, each with bounded tool access. Human reviews accept, edit, or reject; rejections feed a weekly prompt-eval cycle.

0
SDR seats (2026-Q1)
Support
Health-tech · enterprise

Tier-1 deflection agent

RAG over product docs and an 18-month ticket archive. Resolves password, billing, and onboarding without human touch. Clinical questions escalate with full context. We grounded every answer in retrieved snippets, blocked any response below a 0.72 retrieval-score floor, and logged every interaction for a weekly Ragas eval the support lead signs off.

0 %
p1 ticket volume in 90 days (2026-Q1)
Ops
Mfg · 200+ emp

Invoice matching + AP routing agent

OCR plus LLM extraction on PDF and scanned invoices. Matches against open POs in NetSuite, routes to approver via Slack. Exceptions go to the ops lead with an annotated diff. The extraction model was fine-tuned on 4,200 historic invoices; the matcher is a deterministic rules layer the auditor can read line-by-line. No black-box decisions in the AP path.

0
ROI inside 6 months (-Q1)
002 / HOW WE BUILD

The shape of every engagement.

Different workloads, same delivery shape. We start with a discovery audit, ship a 4 to 6 week pilot with weekly evaluation gates, and only continue if the metrics warrant it. Every contract carries a walk-away clause; we'd rather lose the engagement than ship something that doesn't move a number.

Discovery audit

Two weeks. We read your data, sit with the team that owns the workflow, and write a brief that names the model, the eval, the failure modes, and the cost envelope. The output is a go or no-go recommendation, in plain English. If we'd build it differently in-house than as a vendor, we tell you. The AI agent case studies on this page all started in this phase as a one-page brief; the production AI case studies are the same audits, six months later.

Pilot with weekly eval gates

Four to six weeks. A working agent on a real slice of production data, behind a feature flag, with metrics visible to your team in a shared dashboard. We pick the eval framework up front (Ragas for retrieval, deterministic regression sets for routing, human-grading for tone) and review every Friday. If the eval doesn't move week-over-week, we stop. That's the walk-away clause in practice.

Continuous delivery

Ongoing once the pilot passes. We own a piece of your roadmap, ship weekly, and roll back the same day if a metric regresses. Most engagements settle into 1 to 2 ship windows per week with a 24-hour rollback budget; the sales agent has been on this cadence since 2026-Q1 with zero production rollbacks logged.

Where each engagement sits on the practice

The sales agent is a clean AI agent development engagement: multi-agent orchestration, bounded tools, human-in-the-loop edit cycle. The support deflection agent leans on RAG development services for grounded retrieval plus chatbot development services for the conversational surface. The AP routing agent is AI workflow automation on top of OCR; the LLM is one node in a larger deterministic pipeline, not the whole system. Each pattern is documented at its pillar page with the slot counts, the dated benchmarks, and the decision tree we use to scope it. The AI case studies we publish here all sit inside one of those four pillars; the audit conversation is the place to figure out which pillar your workload actually maps to.

Start a project

Want a case study of your own?

Pilot in 2 to 4 weeks. Production build in 8 to 16. Same-day response on every inbound.