P7 · Services

AI consulting services, capability audits, costed roadmaps, vendor selection, board-grade memos

AI consulting services from an engineering-led AI consulting company that ships AI strategy consulting in writing, not slides. AI capability assessment, AI roadmap consulting, AI vendor selection consulting, AI readiness assessment, and AI due diligence, fixed-scope, signed by the engineers who'll still be on the call when the build starts. No kickbacks, no 600-person practice to push into the answer.

Talk to a partner See engagement shapes

Practice AI strategy consulting

Shapes Audit · Roadmap · RFP · DD

Default Written memo + exec readout

Engagements 2–6 weeks · fixed scope

001 / FRAME

Where AI consulting services earn their fee.

Most buyers arrive at enterprise ai strategy with the wrong question pre-attached to the right budget. An ai readiness assessment up front saves them from the most common failure, paying for the wrong shape. The grid below is the first frame we run, buyer shape on the left, engagement stage across the top, and the call we'd make on the rubric. It's the most expensive mistake in this category: spending audit dollars on a question that should've gone straight to vendor RFP, or running vendor RFP on a strategy gap that an audit would've named in two weeks.

Where you are	Audit (2–3w)	Roadmap (3–5w)	Vendor RFP (4–6w)	Build (route out)
Pre-AI, board-mandated	Default	Sequel	Premature	Not yet
AI pilot stalled at month 6	Default	After audit	Often the gap	Reroute to P1/P3
Vendor demos already booked	Skip	Skip	Default	After pick
Build vs buy unresolved	Yes, frame it	Default	Half of it	Only one side
Regulated industry (HIPAA, EU AI Act)	Compliance read	Roadmap + posture	DPIA-aware RFP	Routed to siblings
Acquisition / DD in flight	AI DD memo	Post-close	Rare	N/A
Tooling sprawl, no strategy	Default	Consolidation plan	After rationalise	Not now

Yes = default recommendation. Maybe = depends on a follow-up question we'll cover in the kickoff. No = we'd actively steer you away.

If your shape isn't on the grid, the framing call is free, DM us with the situation and we'll write back inside a business day with which shape fits and what it costs.

002 / SHAPES

Four engagement shapes. Every ai consulting services engagement maps to one.

Fixed scope, fixed fee, written deliverable. We don't sell hours; we sell a memo. The four shapes below cover roughly 95% of inbound, Audit, Roadmap, Vendor RFP, AI Due Diligence. Mixed engagements bill as two consecutive shapes, not an open retainer.

01 AI CAPABILITY AUDIT Fixed scope

2–3 weeks

Read of practice + written memo.

In scope

60-minute kickoff to lock the question
Capability read across model selection, retrieval, eval rigour, observability, MLOps
Data-hygiene audit with named leakage/labelling gaps
20-page written memo + 90-minute exec readout
Recommended next step in writing (Roadmap, RFP, Build, or no-action)

Out of scope

Vendor RFP authoring (Shape 03)
12-month costed roadmap (Shape 02)
Hands-on build (route to technical pillar)

02 AI ROADMAP Fixed scope

3–5 weeks

12-month costed plan against scored use cases.

In scope

All Capability Audit deliverables
Use-case scoring across business value, feasibility, organisational readiness
Vendor longlist scored against the audit rubric
Build-vs-buy frame with TCO modelled three postures
12-month sequence with named phases, owners, exit gates, named tools

Out of scope

Vendor demo shadowing (Shape 03)
Hands-on build (route to technical pillar)
Ongoing retainer (separate engagement)

03 VENDOR RFP SUPPORT Fixed scope

4–6 weeks

RFP authoring + demo shadowing + reference checks.

In scope

RFP authored against the audit rubric
Vendor demos shadowed by an engineer
Scoring sheets filled with named criteria
Reference checks run with at least two existing customers per vendor
Contract terms reviewed for data residency, IP, exit clauses
Procurement-ready recommendation memo

04 AI DUE DILIGENCE Fixed scope

2–4 weeks

Acquisition or board-mandated AI posture read.

In scope

AI surface audit of the target or business unit
Model-evaluation re-run on a leakage-free holdout where applicable
Risk register across IP, data residency, vendor lock-in, regulatory exposure
20-page board-grade memo + 90-minute presentation
Dissenting view named in writing, we don't bury the no-flags

003 / NUMBERS

What an honest AI consulting company looks like at the spreadsheet level.

Pricing transparency that most ai strategy consulting firms hide behind a "let's chat" wall. The shapes are fixed, the timelines are fixed, the deliverable is written. We can't quote the fee until we've scoped the surface, but the range is on the higher end of independent advisory and the lower end of tier-one strategy houses, roughly where the value sits. An ai strategy consulting engagement at this depth is a one-time cost, not a quarterly retainer drip.

004 / GATES

Six gates an honest AI capability audit clears.

An ai audit services memo is only as honest as the gates the auditor runs. Below is the screen we apply to every ai audit services engagement, and the same screen we use when we're hired to second-opinion a memo a tier-one firm already shipped. Second-opinion work routinely flags at least one gate the original audit silently skipped.

01
Eval rigour, not eval theatre

Has the existing AI surface been graded on a leakage-free holdout, with named metrics (AUC, NDCG, faithfulness, ECE) tracked over time? Or is the "eval" a few cherry-picked outputs in a quarterly review deck? Eval theatre is the most common silent failure in stalled pilots, calibration drifts, faithfulness drops, nobody measures, the team blames the model.
02
Observability priced day-one

Langfuse, Braintrust, or LangSmith wired before the second sprint. Traces searchable, prompts versioned, eval runs reproducible. Audit memos that don't price observability as a day-one line item are usually written by a vendor that doesn't want you watching too closely.
03
Data residency named in writing

For regulated workloads, HIPAA, EU AI Act, MAS, GDPR, the deployment posture is named explicitly. Which region. Which provider. Which sub-processor agreement. Half the vendor pilots we audit have a residency gap the vendor's standard contract can't close inside the renewal window.
04
Vendor swap is a 1–2-week migration

If you can't switch from Claude Opus 4.7 to GPT-5 to Gemini 3 Pro inside two weeks, you don't have an AI stack, you have a vendor stack. A routing layer above the provider SDKs costs roughly a week of engineering and saves you the contract negotiation that arrives in month nine.
05
TCO modelled past month 12

Hosted-frontier looks cheap at 1M monthly tokens; at 500M it's eye-watering. The opposite for self-hosted Llama 4. Audits that don't carry the TCO past the budget cycle's horizon land buyers in renegotiation surprise at exactly the wrong time. We model 24 months as the floor and 36 as the ceiling.
06
Failure mode named in writing

What's the single most likely way this build fails at month nine? If the memo can't answer that question, it isn't an audit memo, it's a sales doc. We name the failure mode, the leading indicator, and the threshold at which the trigger fires. A meaningful share of our memos recommend not proceeding at all; the rest name what to watch.

Six-out-of-six clean is rare in our review history. Two or fewer clean is the trigger for the "stalled-pilot" intervention shape under our agent or LLM practices, fix the methodology before the model.

005 / ROADMAP

What a four-phase ai roadmap consulting engagement actually ships.

A 12-month AI roadmap that lands in a board pack isn't a slide deck, it's a sequence with named owners, named gates, and named tools. The four phases below are the standard shape; a complex multi-BU engagement carries an extra discovery phase, a narrow single-use-case engagement collapses phase 2 and 3 into one.

01
Discovery + landscape read

Sixty-minute exec session to lock the question, then a structured read of the current AI surface, what's in production, what's stalled, what's in vendor demos, what's in the spreadsheet. The output of this phase is a one-page problem statement that everyone on the engagement signs off in writing. Some engagements end here because the right answer is "do nothing yet", we still ship the memo and bill the phase.
02
Use-case scoring + vendor longlist

Every candidate use case scored across three axes, business value, technical feasibility, organisational readiness. Scoring rubric shared with the buyer, not run on a private spreadsheet. Vendor longlist assembled per surviving use case, frontier hosted, self-hosted open-weight, vertical SaaS, and the build-it-yourself option each scored on the same rubric. Audit memo's recommendation feeds the scoring inputs.
03
Build-vs-buy frame + TCO

Explicit build-vs-buy frame for the two or three use cases that survive scoring. TCO modelled across three postures, hosted-frontier (Claude / GPT-5 / Gemini), self-hosted open-weight (Llama 4 / Mistral / Qwen 3 on vLLM), and hybrid routing, with the volume crossover named in months, not vibes. Sensitivity analysis on the three assumptions most likely to change. We share the spreadsheet, not a sanitised summary.
04
12-month sequence + exit gates

Twelve-month sequence with named phases, named owners (internal hire, sibling practice, third-party vendor), named exit gates per phase, and named tools, LangGraph, Pinecone, Langfuse, the actual names a procurement team has to put on contracts. Each phase carries a "fail-here-and-pivot-there" branch. The memo is the artefact that survives the engagement; the readout is theatre.

Clean handoff is the default. Most roadmaps name a recommended internal hire alongside the vendor sequence, the work that survives this engagement is the practice you build inside, not the consultant you retain.

006 / EVALUATE

The six vendor categories every roadmap evaluates.

An ai vendor selection consulting engagement isn't a vendor-by-vendor scorecard, it's a category-by-category architectural call. The six families below cover roughly 95% of the recommendations in the roadmaps we've shipped this year. Per family, the audit names the default pick, the cost-floor alternative, and the conditions under which we'd revisit in 12 months. We've run ai vendor selection consulting across all six categories in the last 18 months.

Frontier hosted LLMs (Claude · GPT-5 · Gemini 3)

Strengths

Highest reasoning ceiling and the fastest iteration loop. Claude Opus 4.7 holds the lead on long-context analysis; GPT-5 leads on tool-call latency at scale; Gemini 3 Pro wins on 1M-token retrieval workloads. Pricing has compressed but premium tier still runs $3–15 input, $15–75 output per million tokens.

When We Pick

Greenfield AI roadmaps where time-to-first-value matters more than per-token cost. C-suite-visible builds where the model name is itself a signalling cost. Workloads under ~200M monthly tokens, below that, the frontier price premium is rounding error against engineering salary.

When We Don't

Predictable, high-volume workloads where a tuned smaller LLM beats frontier on cost by 8–20×. Strict data-residency where the provider's region map doesn't match yours. Vendor-lock anxiety where a board member has already vetoed single-source.

Paiteq Pattern

Audit memos almost always recommend a two-vendor posture, one frontier, one mid-tier, with a routing layer keeping migration friction near zero. Three names beats two for posture and one for cost discipline.

FrontierReasoningMulti-vendor

Self-hosted open-weight (Llama 4 · Mistral · Qwen 3)

Strengths

Total control. Llama 4 405B served on H100s via vLLM hits sub-150ms p50 on most chat workloads at roughly $0.05–0.20 per million tokens amortised. Mistral and Qwen 3 cover the mid-tier; both ship instruct-tuned variants that beat year-old frontier on narrow domains after light fine-tuning.

When We Pick

Workloads above ~500M monthly tokens where per-token economics flip the spreadsheet. Regulated workloads where data residency or model-weight ownership is non-negotiable. Vertical workloads where fine-tuning on customer data unlocks 30–60% accuracy lift over generic frontier.

When We Don't

Sub-50M-token workloads, GPU amortisation kills the math. Teams without MLOps capacity, see MLOps services; running vLLM in production is not a side project. Reasoning-heavy workloads where the frontier ceiling still matters more than cost.

Paiteq Pattern

We recommend self-host when the unit economics cross the line, usually a clear inflection rather than a gradient. Audit memo names the volume threshold and the year it'll be hit, not the year a board member wishes it would be hit.

Self-hostOpen-weightCost-floor

Vector DBs + retrieval (Pinecone · Qdrant · pgvector · Weaviate)

Strengths

Retrieval is where most AI roadmaps actually live or die. Pinecone Serverless cuts ops to near zero at a premium tier. Qdrant self-hosts cleanly on Kubernetes for the team that already runs one. pgvector is the cheapest, lowest-friction choice when Postgres is already in the stack. Weaviate wins on multi-modal retrieval.

When We Pick

Anywhere the AI value proposition requires grounded answers, clinical, legal, regulated, internal-knowledge-rich. Almost every roadmap we ship recommends a retrieval-augmented generation pipeline as the first build, not an agent.

When We Don't

Pure reasoning workloads with no enterprise knowledge to ground against. Workloads where the answer is already public-internet-shaped, frontier LLM alone usually wins. Tiny corpora under 10k chunks where in-context retrieval beats a vector store.

Paiteq Pattern

Default recommendation: pgvector when the team already runs Postgres; Pinecone Serverless when ops bandwidth is the constraint; Qdrant when data residency requires self-host. We don't recommend Weaviate unless multi-modal retrieval is the headline requirement.

RetrievalGroundedHybrid-search

Agent + workflow frameworks (LangGraph · CrewAI · n8n · Temporal)

Strengths

LangGraph is the 2026 default for state-graph agent orchestration, the only mature framework with proper state-machine semantics. CrewAI ships fastest for role-based shapes if you don't need state-graph control. n8n covers deterministic-plus-AI workflows for ops teams. Temporal is the durable-execution backbone for high-stakes long-running flows.

When We Pick

Multi-step agentic builds with branching state, LangGraph. Workflow automation with LLM-in-the-loop and a non-engineer ops team, n8n. Long-running orchestration with retry semantics, Temporal. We've shipped all four in production over the last 12 months.

When We Don't

Single-turn chatbots, none of these is the right tool; see chatbot development. Toy POCs, direct API calls win on velocity. AutoGen, stalled relative to LangGraph; we no longer recommend it for new builds.

Paiteq Pattern

Roadmap usually pairs LangGraph (agent runtime) with Temporal (durable execution) for builds where retries and human-approval gates matter. n8n shows up when the buyer is non-engineering and the workflow is more deterministic than agentic.

AgenticStatefulDurable

Observability + eval (Langfuse · Braintrust · LangSmith · Inspect)

Strengths

The audit gate every roadmap we ship requires. Langfuse leads OSS observability with traces, prompt versioning, and a usable eval surface. Braintrust dominates closed-source eval workflows. LangSmith is fine if you're already inside the LangChain ecosystem. Inspect AI (UK AISI-backed) is the rigour pick for safety-critical evals.

When We Pick

Every roadmap recommends observability as a day-one cost line, not a phase-three nice-to-have. Most AI pilots fail because nobody knew which prompts were drifting or which tools were silently dropping, instrumentation is the cheapest insurance in the stack.

When We Don't

Toy projects where the eval loop is a human reading 10 outputs. We don't recommend bare logging-without-traces for anything past prototype, it's the false-economy that creates the month-six pilot stall.

Paiteq Pattern

Default recommendation: Langfuse self-hosted for teams that want open-source plus data control; Braintrust for teams with budget and no ops capacity. RAGAS or DeepEval as the eval harness layer regardless of trace backend. Audit memos always price observability in.

EvalTracingDay-one

Voice + multimodal stack (LiveKit · Pipecat · ElevenLabs)

Strengths

LiveKit Agents and Pipecat both land sub-400ms voice turn-take in production. ElevenLabs leads on voice quality; the open-source side (Whisper Large v3, F5-TTS) is closing fast. Vision-LLMs (Claude Sonnet 4.6, GPT-5 Vision, Gemini 3 Pro) cover document understanding without a custom CV pipeline.

When We Pick

Voice agents, support deflection, clinical intake, sales prospecting, where the latency budget is human-conversational. Document understanding at scale where the alternative is a custom vision stack we'd route to our ML practice instead.

When We Don't

Roadmaps where voice is a CEO whim, not a buyer journey. Vision tasks with extreme accuracy bars (defect detection, medical imaging), frontier vision-LLM isn't the answer; a fine-tuned vision backbone is.

Paiteq Pattern

We've taken voice agents from POC to production three times this year. Roadmaps prescribe LiveKit + Claude Sonnet 4.6 + ElevenLabs as the default stack; cheaper open-source substitutes priced as a phase-two option.

VoiceMultimodalLatency

007 / ARCHETYPES

Four strategy archetypes. Roughly all inbound maps to one of these.

Greenfield, Modernise, RPA-Replace, and Acquisition/Board-DD cover roughly 100% of the ai consulting services engagements we've shipped over the last 18 months. Shape determines deliverable, deliverable determines pricing, pricing determines scope. We won't sell you a Greenfield engagement when you're really in Modernise, the framing call is free and we'll route you to the right shape.

GREENFIELD

The board has approved an AI budget for the first time. There's no incumbent AI system, no internal champion with battle scars, and no anchor use case picked. Roughly 40% of our AI consulting services engagements start here. The audit memo names the three highest-leverage use cases against capability + value scoring, sequences them, and prices the first six months in detail so finance can sign without re-reviewing.

Pick when

First AI budget cycle
No prior pilots or pilots all stalled
Multiple business units competing for the budget
CTO + CFO + COO all needed on the call
You've been pitched by three vendors and trust none of them

Skip when

Pilot already in production and earning revenue, different shape
Vendor already picked, go straight to RFP support
Pure model-routing question, that's a 1-week LLM audit instead

Stack

Capability auditUse-case scoringCosted roadmapVendor longlist

008 / BUILD VS BUY

Build vs buy AI, row-by-row on the dimensions that actually matter.

Build-vs-buy is the most-asked question in the audit room and the most-mis-answered in the slide deck. The grid below is the frame we use, nine rows the spreadsheet usually skips. Every roadmap recommendation gets graded against these rows; the call lands in writing with the dissenting view named.

	Buy (vendor / SaaS)	Build (custom + your engineers)
Time to first business value	6–14 weeks (custom build)	2–6 weeks (vendor pilot)
Total 24-month spend (mid use case)	Multi-quarter engineering build	Multi-year licence + integration
Eval and observability ownership	Yours from day one, Langfuse / Braintrust / RAGAS in your repo	Often vendor-owned; export depth varies; some lock-in
Eval ownership is the hidden variable most buyers miss. When your traces, evals, and latency data live in your own repo, you can catch regression before users do, and you own the dataset needed to fine-tune the next model. Vendor-owned observability means you're debugging against a dashboard the vendor also controls.
Model swap (Claude → GPT-5 → Gemini)	1–2 weeks via routing layer	Often blocked by vendor's single-model architecture
Frontier model performance rankings shift every three to six months. A routing layer (LiteLLM, Portkey, or a thin abstraction over the Anthropic + OpenAI SDKs) means a price drop or a capability leap costs you a config change, not a re-platform. Vendor architectures that bake in a single model create switching costs that compound at renewal.
Customisation ceiling	Anything you can write; LoRA fine-tunes on your data	Whatever the vendor's roadmap allows; 6–18 month wait per feature
Data residency + private deployment	Self-host on Llama 4 / Mistral; full control	Depends on vendor; ~half offer single-tenant; few offer self-host
Team capability gain	Engineering org learns AI as it ships	Vendor-dependent; team learns the vendor, not the domain
For most product companies, building is a talent investment as much as a delivery vehicle. Engineers who ship a RAG pipeline or an agent loop in production become your internal AI bench, they spot the next use case, they review vendor claims with real context, and they're harder to lose to attrition than a vendor relationship.
Failure mode	Engineering burn-rate without product progress	Vendor lock-in, roadmap drift, sudden price hike at renewal
Both failure modes are real and roughly equally costly. The build failure is visible early, sprint velocity without shipped value. The vendor failure is invisible until the renewal conversation, when the price has tripled and migration would take longer than the original build would have. The audit's job is to predict which risk is higher for your specific workload.
Where we recommend it	Differentiator workloads, the AI IS the moat	Table-stakes workloads, chat, support deflection, basic RAG

The honest answer is usually both, buy for table-stakes (chat, basic RAG, support deflection), build for the differentiator workloads (the AI IS the moat). Roadmaps name the line per workload.

Where we recommend buy, we score vendors against the audit rubric and shadow demos with the buyer. Where we recommend build, we route the work to the right Paiteq technical pillar, agentic systems, retrieval pipelines, custom LLM apps, classical ML, or to a named third party where their fit is better.

009 / CRITERIA

Six vendor-evaluation criteria the procurement-grade rubric scores.

When a vendor RFP support engagement runs, the rubric is six criteria, scored 1–5, signed off by the buyer at kickoff. No vendor-flavoured spin in the criteria list; no "innovation" or "thought leadership" cells. The criteria below are the ones that actually predict whether the contract pays itself back in 24 months.

01
Evaluation depth + export

Does the vendor let you export traces, eval runs, and prompt versions in a structured format your team can analyse in Langfuse or Braintrust? "We have a dashboard" is the wrong answer, you need data out, not screenshots in.
02
Model swap + provider routing

Can you swap the underlying LLM, Claude, GPT-5, Gemini, self-hosted Llama 4, in two weeks of vendor support effort? If the answer is "we're optimised for our chosen model," that's lock-in dressed as performance. Multi-provider is table stakes in 2026.
03
Data residency + sub-processors

Named region, named sub-processors, named DPAs. For HIPAA, EU AI Act, MAS, FINMA, the auditor is going to ask. Half the vendor pilots we re-evaluate have a residency gap the standard contract can't close.
04
Customisation ceiling

How customisable is the model surface, prompts, tools, fine-tuning, retrieval logic? What can the buyer's engineering team change without a vendor PR? "Configurable" usually means "the vendor's roadmap dictates pace"; "customisable" means you write code.
05
Pricing + volume-elasticity

What's the cost at 10×, 50×, and 100× current volume? Vendors love the entry tier; the real question is the renewal-cycle math when usage scales. We model the spreadsheet at the assumed-growth volume, not the current one.
06
Exit clauses + IP terms

What happens to your data, your fine-tunes, your eval set, your custom prompts when you exit? "We export your data" is the floor; the question is what shape the export takes and whether your engineering team can ingest it into a successor system without a six-month migration.

We don't take vendor kickbacks. The only money in our P&L is the consulting fee on the engagement. Where a sibling Paiteq practice could plausibly compete with a vendor we'd recommend, we disclose the conflict in writing inside the memo and recommend the option that wins on the rubric, we've recommended against ourselves three times in 2026.

010 / WHERE

Six advisory shapes across six industries, where we've shipped.

A capability-by-industry heatgrid for the ai consulting services we've actually run, not what the brochure promises. Strength reflects engagements completed; light cells are honest about depth we haven't built.

Function Industry

B2B SaaS

Fintech

Healthcare

Manufacturing

Logistics

Legal

AI Capability Audit

AI Roadmap (12-mo)

Vendor RFP Support

Build-vs-Buy Memo

AI Due Diligence

Board AI Posture

AI Capability Audit

B2B SaaSFintechHealthcareManufacturingLogisticsLegal

AI Roadmap (12-mo)

B2B SaaSFintechHealthcareManufacturingLogistics Legal

Vendor RFP Support

B2B SaaSFintechHealthcareManufacturingLegal Logistics

Build-vs-Buy Memo

B2B SaaSFintechHealthcareManufacturingLogistics Legal

AI Due Diligence

B2B SaaSFintechLogisticsLegal HealthcareManufacturing

Board AI Posture

B2B SaaSFintechHealthcareManufacturingLegal Logistics

Possible fit Good fit Primary vertical

Dark cells: 3+ engagements completed. Medium: 1–2 engagements. Light: scoped but not yet completed. Empty: not yet relevant.

011 / PROCESS

Six steps. Three weeks. One written memo.

Eval-first, baseline-anchored, ai capability assessment methodology, refined across engagements in SaaS, fintech, healthcare, manufacturing, logistics, and legal. The sequence below is the standard run; complex multi-BU engagements add a week of discovery; narrow single-use-case engagements collapse weeks 2 and 3. The ai capability assessment doubles as the procurement-gating doc when the engagement converts to RFP.

WEEK 1

Kickoff + landscape read

60-minute exec session to lock the question. Read of the current AI surface, what's in production, what's stalled, what's in vendor demos, what's in the spreadsheet. The question we're answering gets written down before we look at anything technical.

WEEK 1–2

Capability + data audit

Technical read of the existing AI surface, model choices, retrieval architecture, eval rigour, observability, MLOps posture. Data hygiene audit for the use cases on the table. Half the audits we run surface a leakage or labelling gap that has to close before any new build.

WEEK 2

Use-case scoring

Every candidate use case scored on three axes, business value, technical feasibility, organisational readiness. The scoring rubric is shared; nothing is graded on a private spreadsheet. Often the highest-value use case isn't the highest-feasibility, that's the tradeoff the memo names.

WEEK 2–3

Vendor + build path read

For the two or three use cases that survive scoring, an explicit build-vs-buy frame. Vendor shortlist scored against the same rubric the buyer will face in procurement. Build path scoped, costed, and timeline'd against named tools, Claude, GPT-5, LangGraph, Pinecone, Langfuse.

WEEK 3

Roadmap + TCO

12-month sequence with named phases, named owners, named exit gates. TCO modelled across hosted, self-hosted, and hybrid postures, we share the spreadsheet, not a sanitised summary. Sensitivity analysis on the three assumptions most likely to change.

WEEK 3

Memo + readout

20-page written memo plus 90-minute exec readout. The memo names the call, the dissenting view, and the conditions under which we'd change our mind. Board-grade artefact, most clients use it as the procurement gating doc downstream.

012 / WHY PAITEQ

Why teams pick us as their ai consulting company.

01
Engineers sign the memo

The partner who signs the audit memo is the engineer who can pick up the phone when the build starts. No analyst-to-partner ladder, no slide-deck-only deliverable. Memos name tools, name gates, name failure modes, the kind of writing a build team can execute against without a translation layer.
02
No kickbacks. Zero. Audited.

We don't take referral fees from Anthropic, OpenAI, Google, Microsoft, Pinecone, Temporal, Langfuse, ElevenLabs, or any other vendor we score. The only money in our P&L is the consulting fee on the engagement. Where a sibling Paiteq practice could compete with a vendor we'd recommend, we disclose the conflict in writing and recommend the option that wins the rubric anyway.
03
Fixed scope, fixed fee, written deliverable

Two to six weeks per engagement; no time-and-materials clock; no six-month strategy retainer. The memo is the artefact. The readout is theatre. Roughly 95% of our ai consulting services engagements close within the original scope; the rest convert to a follow-up shape with a separate SOW.
04
Dissenting view named in writing

Every memo names the call, the conditions under which we'd change our mind, and the dissenting view from inside our team. We don't sand the edges off the analysis to land a follow-on engagement. Audits that conclude "no further action" still ship the memo and bill the engagement at the agreed scope, the methodology, not the recommendation, is what you paid for.
05
Roadmap that survives the engagement

Named phases, named owners, named exit gates, named tools. Procurement-ready. Board-ready. The kind of roadmap an internal director can execute against six months after we've left, without picking up the phone, because the artefact carries the analysis, not just the conclusion.
06
Cross-cutting AI estate, not single-modality

AI consulting services here cover the whole estate, retrieval, generation, agents, classical ML, workflow automation, voice. Modality-specific advisory routes to the sibling practice. The strength is integrating across modalities, not selling deeper into one, buyers in multi-modality strategy land in the right pillar by default.

013 / SHAPES

Four ways to start an ai consulting services engagement.

The four shapes above as picker cards. Fixed-scope, fixed-fee, written deliverable. Pick the closest match, the framing call refines if needed.

01 / AUDIT ↗

AI Capability Audit

Two to three weeks, fixed scope. Read of your current AI surface, data hygiene, eval rigour, and deployment posture. Deliverable is a written memo plus a 90-minute exec readout. The most common starting point for an ai consulting services engagement.

2–3 wksFixed

02 / ROADMAP ↗

AI Roadmap

Three to five weeks, fixed scope. 12-month costed plan against scored use cases, vendor shortlist, build-vs-buy framing, and TCO model. The default sequel to an audit. Goes to your board as a board-grade artefact.

3–5 wksFixed

03 / RFP ↗

Vendor RFP Support

Four to six weeks, fixed scope. RFP authored against the rubric used in the audit. Vendor demos shadowed, scoring sheets filled, reference checks run, contract terms reviewed. We work for the buyer, not the vendor, we sign nothing kicked back.

4–6 wksFixed

04 / AI DD ↗

AI Due Diligence

Two to four weeks, fixed scope. Acquisition target or board-mandated AI posture read. 20-page memo plus a 90-minute board presentation. We've shipped this on five M&A targets and four board reviews in 2026 so far.

2–4 wksFixed

014 / EVALUATED

Vendors we've evaluated in audits this year.

Frontier LLMs, agent runtimes, retrieval, observability, and voice, the surface 2026 roadmaps actually touch.

Claude Opus 4.7
GPT-5
Gemini 3 Pro
Llama 4
Mistral Large 3
LangGraph
CrewAI
Temporal
Pinecone
Qdrant
pgvector
Weaviate
Langfuse
Braintrust
LiveKit
ElevenLabs
Claude Opus 4.7
GPT-5
Gemini 3 Pro
Llama 4
Mistral Large 3
LangGraph
CrewAI
Temporal
Pinecone
Qdrant
pgvector
Weaviate
Langfuse
Braintrust
LiveKit
ElevenLabs

015 / USE CASES

Where the memos have landed.

Three anonymized engagements. Function, segment, and outcome metric are real; brand removed under NDA.

Healthcare

Multi-state payer · regulated-data shape

HIPAA-aware audit before a frontier-vendor procurement

Typical shape: a carrier has a frontier-vendor pilot in late-stage procurement and pulls us in for an independent audit. We pressure-test data residency, BAA coverage, and the deployment's ability to close HIPAA gaps inside the contract window. Where the vendor posture can't close, we re-frame the use case as a self-hosted Llama 4 + pgvector RAG build under our <a href="/services/rag-development/">retrieval-augmented generation</a> practice and re-price the roadmap against the vendor licence.

Deliverable: -page memo, residency gap register, re-priced roadmap

Fintech

Pre-Series-B regulated lending · EU

AI due diligence read on a credit-scoring model

Typical shape: investor diligence on a regulated-lending AI startup. We re-evaluate model claims on a leakage-free holdout, score fairness across protected slices, and write the memo named after the call, proceed, re-build, or walk. The deliverable feeds directly into the term sheet and the regulator briefing.

Deliverable: held-out eval report, fairness register, board-grade memo

Logistics

Last-mile delivery · UK + EU

RPA-replace roadmap against a renewal cliff

Typical shape: a UiPath or Blue Prism estate is approaching renewal and the AI-modernisation question lands at the wrong time. We score every bot process against a structured rubric, recommend per-process actions (migrate to <a href="/services/ai-workflow-automation/">LangGraph + Temporal workflow</a> / retain as classical RPA / retire), and sequence the migration against the renewal calendar.

Deliverable: scored process register, sequenced -month migration plan

016 / FAQ

What buyers ask before signing.

How is AI consulting services from Paiteq different from McKinsey, BCG, or Deloitte?

Different shape, different deliverable. Tier-one strategy houses produce slide decks; we produce written memos signed by engineers who will still be picking up the phone when the build starts. Our AI consulting services engagements run two to six weeks fixed-scope, not the six-month strategy retainers tier-one houses default to, and they end with a costed roadmap that a real engineering team can execute against, including named tools, named gates, and a TCO sensitivity analysis you can hand to procurement. We don't have a 600-person ML practice to push into the answer; that's a feature, not a bug. Where the question is genuinely about org-design across 40,000 people, McKinsey beats us. Where the question is what to actually build and which vendor to actually sign, we beat them roughly nine times out of ten.

Do you sign off on vendor selection, and how do you avoid kickback bias?

Yes. We score vendors against the same rubric the buyer will face downstream in procurement, we sit in on demos, and the memo names the call by name. We don't take referral fees from any of the vendors we evaluate, Pinecone, Anthropic, OpenAI, Google, Microsoft, LangChain, Temporal, ElevenLabs, none of them. The only money in our P&L is the consulting fee we billed you. Where a sibling Paiteq practice could plausibly compete with a vendor we'd recommend, for example our own RAG development services versus a vendor RAG product, we disclose the conflict in writing inside the memo and recommend the option that wins on the rubric anyway. We've recommended against ourselves three times in 2026.

When does it make sense to skip consulting and go straight to a build?

When the use case is clean and the vendor selection is already settled, skip us. Examples we'd genuinely route straight to a build: a single-purpose RAG over a known corpus with one buyer-approved vendor; an agent migration where the destination framework is already chosen by the engineering team; a voice agent build where LiveKit is already procured and the question is just whether to use Claude Sonnet 4.6 or GPT-5 for the brain. Where consulting actually earns its fee is when the question itself isn't yet clean, pre-AI greenfield, stalled pilot, RPA renewal pressure, vendor sprawl, or a board asking for a posture read. Buyers who think they're in the first bucket but are actually in the second usually burn one to two quarters of engineering before realising it. The audit is cheaper.

What's in a written audit memo, and can I see a redacted one?

Twenty pages, give or take. Cover memo with the call and the three dissenting views; capability score across model selection, retrieval, eval rigour, observability, MLOps; data-hygiene findings with named leaks if any; vendor scorecard against the rubric; build-vs-buy frame with TCO modelled across three postures (hosted, self-hosted, hybrid); 12-month roadmap with named phases, named exit gates, and named tools (Claude, GPT-5, LangGraph, Pinecone, Langfuse, actual names, not categories); risk register; sensitivity analysis on the assumptions most likely to change. We can share a redacted memo under NDA, DM us through the contact form and we'll send one inside two business days. The redacted version covers a multi-state US healthcare payer engagement; brand and dollar figures removed, structure and analysis intact.

How do you price an audit, and how is that different from your AI roadmap pricing?

Fixed scope, fixed fee, both shapes. An AI Capability Audit runs two to three weeks; pricing scales with the technical surface, a single-team single-use-case audit lands at the lower end; a multi-BU multi-pilot audit at the upper. An AI Roadmap engagement runs three to five weeks; pricing scales with the number of use cases sequenced and the vendor evaluation depth. Both ship with a written memo and an exec readout. Neither runs on a time-and-materials clock, we don't sell hours; we sell a written deliverable against a fixed scope. We'll quote exact numbers after a 30-minute scoping call; the AI consulting services pricing range is on the higher end of independent advisory and the lower end of tier-one strategy houses, which is roughly where the value sits.

How is this pillar different from your generative AI consulting or LLM consulting work?

Three siblings, three different questions. AI consulting services here covers cross-cutting AI advisory, vendor selection across modalities, build-vs-buy framing across the whole AI estate, 12-month roadmaps that span retrieval and generation and agentic workloads. Generative AI consulting (the advisory wrap on our generative AI practice) is narrower, image, audio, video, brand-controlled generation; LoRA strategy; safety + watermarking posture. LLM consulting (inside our LLM development practice) is hosted-vs-self-hosted decisions, fine-tuning strategy, cost engineering on a known LLM workload. If your question is multi-modality and cross-cutting, you're in the right pillar. If it's modality-specific or model-architecture-specific, route to the sibling.

Do you stay on after the roadmap ships, or hand off cleanly?

Clean handoff is the default. Every memo names exit gates and a recommended owner per workstream, sometimes the recommendation is an internal hire, sometimes a Paiteq sibling practice, sometimes a third-party vendor. About 40% of audit engagements convert to a build engagement with us under one of the technical pillars (AI Agent Development, RAG, LLM, or Machine Learning); about 30% convert to roadmap then build; about 30% take the memo, hand it to an existing team or a competing vendor, and execute without us. We don't penalise the third path, the memo is a finished artefact in itself. Retainer engagements exist for clients who want us in the room monthly for the first year, but we don't push them by default.

Where the audit recommendation usually leads.

About 60% of audit outputs hand off to our AI agent development company practice (a production agent built against the audit-priority workflow) or to RAG development services (citation-enforced retrieval over enterprise knowledge); the other 30% land in our AI automation agency practice or in LLM development services when the win is a custom LLM app, not a workflow. The remaining 10% recommend no AI work for two quarters, walk-away clause in place.

Industry-specific routes from the audit: AI for fintech (model risk, SR 11-7, FFIEC) when the buyer is a bank, custom AI insurance development when the audit surface is underwriting or claims, AI healthcare software development when the buyer is a clinical or payer organization, and AI for SaaS companies when the workload is embedded inside a product roadmap. Logistics software development company work and AI for ecommerce consulting round out the regulated-and-operational mix. When the audit recommendation is "modernize the legacy estate first", the next step is AI migration services. Founder context: Navin Sharma reads inbounds personally for audit-grade engagements. The 2026 AI automation solutions buyer's guide covers the vendor matrix and ROI math we walk audit buyers through. The wider context lives in the Paiteq engineering practice and the broader AI development company homepage.

017 / Related practices

Adjacent services.

AI AGENT DEVELOPMENT

AI Agent Development

Autonomous, tool-using AI agents for production workloads.

LLM DEVELOPMENT

LLM Development

Custom LLM apps — RAG, fine-tuning, evaluation, deployment.

RAG DEVELOPMENT

RAG Development

Retrieval-augmented generation systems with evaluation built in.

018 / Start an engagement

Ship an honest AI audit in three weeks.

AI capability assessment in 2–3. Enterprise ai strategy roadmap in 3–5. AI vendor selection consulting in 4–6. AI readiness assessment + AI due diligence in 2–4.

Talk to a partner See a redacted memo

AI consulting services, capability audits, costed roadmaps, vendor selection, board-grade memos

Where AI consulting services earn their fee.

Four engagement shapes. Every ai consulting services engagement maps to one.

Read of practice + written memo.

12-month costed plan against scored use cases.

RFP authoring + demo shadowing + reference checks.

Acquisition or board-mandated AI posture read.

What an honest AI consulting company looks like at the spreadsheet level.

Six gates an honest AI capability audit clears.

What a four-phase ai roadmap consulting engagement actually ships.

Discovery + landscape read

Use-case scoring + vendor longlist

Build-vs-buy frame + TCO

12-month sequence + exit gates

The six vendor categories every roadmap evaluates.

Four strategy archetypes. Roughly all inbound maps to one of these.

GREENFIELD

MODERNISE

RPA-REPLACE

DD / AI-AUDIT

Build vs buy AI, row-by-row on the dimensions that actually matter.

Six vendor-evaluation criteria the procurement-grade rubric scores.

Six advisory shapes across six industries, where we've shipped.

Six steps. Three weeks. One written memo.

Kickoff + landscape read

Capability + data audit

Use-case scoring

Vendor + build path read

Roadmap + TCO

Memo + readout

Why teams pick us as their ai consulting company.

Four ways to start an ai consulting services engagement.

Vendors we've evaluated in audits this year.

Where the memos have landed.

HIPAA-aware audit before a frontier-vendor procurement

AI due diligence read on a credit-scoring model

RPA-replace roadmap against a renewal cliff

What buyers ask before signing.

Where the audit recommendation usually leads.

Adjacent services.

Ship an honest AI audit in three weeks.