P7 · Services

AI consulting services — capability audits, costed roadmaps, vendor selection, board-grade memos

AI consulting services from an engineering-led AI consulting company that ships AI strategy consulting in writing, not slides. AI capability assessment, AI roadmap consulting, AI vendor selection consulting, AI readiness assessment, and AI due diligence — fixed-scope, signed by the engineers who'll still be on the call when the build starts. No kickbacks, no 600-person practice to push into the answer.

Practice AI strategy consulting
Shapes Audit · Roadmap · RFP · DD
Default Written memo + exec readout
Engagements 2–6 weeks · fixed scope
001 / FRAME

Where AI consulting services earn their fee.

Most buyers arrive at enterprise ai strategy with the wrong question pre-attached to the right budget. An ai readiness assessment up front saves them from the most common failure — paying for the wrong shape. The grid below is the first frame we run — buyer shape on the left, engagement stage across the top — and the call we'd make on the rubric. It's the most expensive mistake in this category: spending audit dollars on a question that should've gone straight to vendor RFP, or running vendor RFP on a strategy gap that an audit would've named in two weeks.

Where you are Audit (2–3w)Roadmap (3–5w)Vendor RFP (4–6w)Build (route out)
Pre-AI, board-mandated Default Sequel Premature Not yet
AI pilot stalled at month 6 Default After audit Often the gap Reroute to P1/P3
Vendor demos already booked Skip Skip Default After pick
Build vs buy unresolved Yes — frame it Default Half of it Only one side
Regulated industry (HIPAA, EU AI Act) Compliance read Roadmap + posture DPIA-aware RFP Routed to siblings
Acquisition / DD in flight AI DD memo Post-close Rare N/A
Tooling sprawl, no strategy Default Consolidation plan After rationalise Not now
Yes = default recommendation. Maybe = depends on a follow-up question we'll cover in the kickoff. No = we'd actively steer you away.

If your shape isn't on the grid, the framing call is free — DM us with the situation and we'll write back inside a business day with which shape fits and what it costs.

002 / SHAPES

Four engagement shapes. Every ai consulting services engagement maps to one.

Fixed scope, fixed fee, written deliverable. We don't sell hours; we sell a memo. The four shapes below cover roughly 95% of inbound — Audit, Roadmap, Vendor RFP, AI Due Diligence. Mixed engagements bill as two consecutive shapes, not an open retainer.

01 AI CAPABILITY AUDIT Fixed scope
2–3 weeks

Read of practice + written memo.

In scope
  • 60-minute kickoff to lock the question
  • Capability read across model selection, retrieval, eval rigour, observability, MLOps
  • Data-hygiene audit with named leakage/labelling gaps
  • 20-page written memo + 90-minute exec readout
  • Recommended next step in writing (Roadmap, RFP, Build, or no-action)
Out of scope
  • Vendor RFP authoring (Shape 03)
  • 12-month costed roadmap (Shape 02)
  • Hands-on build (route to technical pillar)
02 AI ROADMAP Fixed scope
3–5 weeks

12-month costed plan against scored use cases.

In scope
  • All Capability Audit deliverables
  • Use-case scoring across business value, feasibility, organisational readiness
  • Vendor longlist scored against the audit rubric
  • Build-vs-buy frame with TCO modelled three postures
  • 12-month sequence with named phases, owners, exit gates, named tools
Out of scope
  • Vendor demo shadowing (Shape 03)
  • Hands-on build (route to technical pillar)
  • Ongoing retainer (separate engagement)
03 VENDOR RFP SUPPORT Fixed scope
4–6 weeks

RFP authoring + demo shadowing + reference checks.

In scope
  • RFP authored against the audit rubric
  • Vendor demos shadowed by an engineer
  • Scoring sheets filled with named criteria
  • Reference checks run with at least two existing customers per vendor
  • Contract terms reviewed for data residency, IP, exit clauses
  • Procurement-ready recommendation memo
04 AI DUE DILIGENCE Fixed scope
2–4 weeks

Acquisition or board-mandated AI posture read.

In scope
  • AI surface audit of the target or business unit
  • Model-evaluation re-run on a leakage-free holdout where applicable
  • Risk register across IP, data residency, vendor lock-in, regulatory exposure
  • 20-page board-grade memo + 90-minute presentation
  • Dissenting view named in writing — we don't bury the no-flags
003 / NUMBERS

What an honest AI consulting company looks like at the spreadsheet level.

Pricing transparency that most ai strategy consulting firms hide behind a "let's chat" wall. The shapes are fixed, the timelines are fixed, the deliverable is written. We can't quote the fee until we've scoped the surface, but the range is on the higher end of independent advisory and the lower end of tier-one strategy houses — roughly where the value sits. An ai strategy consulting engagement at this depth is a one-time cost, not a quarterly retainer drip.

004 / GATES

Six gates an honest AI capability audit clears.

An ai audit services memo is only as honest as the gates the auditor runs. Below is the screen we apply to every ai audit services engagement — and the same screen we use when we're hired to second-opinion a memo a tier-one firm already shipped. Second-opinion work routinely flags at least one gate the original audit silently skipped.

Six-out-of-six clean is rare in our review history. Two or fewer clean is the trigger for the "stalled-pilot" intervention shape under our agent or LLM practices — fix the methodology before the model.

005 / ROADMAP

What a four-phase ai roadmap consulting engagement actually ships.

A 12-month AI roadmap that lands in a board pack isn't a slide deck — it's a sequence with named owners, named gates, and named tools. The four phases below are the standard shape; a complex multi-BU engagement carries an extra discovery phase, a narrow single-use-case engagement collapses phase 2 and 3 into one.

  1. 01

    Discovery + landscape read

    Sixty-minute exec session to lock the question, then a structured read of the current AI surface — what's in production, what's stalled, what's in vendor demos, what's in the spreadsheet. The output of this phase is a one-page problem statement that everyone on the engagement signs off in writing. Some engagements end here because the right answer is "do nothing yet" — we still ship the memo and bill the phase.

  2. 02

    Use-case scoring + vendor longlist

    Every candidate use case scored across three axes — business value, technical feasibility, organisational readiness. Scoring rubric shared with the buyer, not run on a private spreadsheet. Vendor longlist assembled per surviving use case — frontier hosted, self-hosted open-weight, vertical SaaS, and the build-it-yourself option each scored on the same rubric. Audit memo's recommendation feeds the scoring inputs.

  3. 03

    Build-vs-buy frame + TCO

    Explicit build-vs-buy frame for the two or three use cases that survive scoring. TCO modelled across three postures — hosted-frontier (Claude / GPT-5 / Gemini), self-hosted open-weight (Llama 4 / Mistral / Qwen 3 on vLLM), and hybrid routing — with the volume crossover named in months, not vibes. Sensitivity analysis on the three assumptions most likely to change. We share the spreadsheet, not a sanitised summary.

  4. 04

    12-month sequence + exit gates

    Twelve-month sequence with named phases, named owners (internal hire, sibling practice, third-party vendor), named exit gates per phase, and named tools — LangGraph, Pinecone, Langfuse, the actual names a procurement team has to put on contracts. Each phase carries a "fail-here-and-pivot-there" branch. The memo is the artefact that survives the engagement; the readout is theatre.

Clean handoff is the default. Most roadmaps name a recommended internal hire alongside the vendor sequence — the work that survives this engagement is the practice you build inside, not the consultant you retain.

006 / EVALUATE

The six vendor categories every roadmap evaluates.

An ai vendor selection consulting engagement isn't a vendor-by-vendor scorecard — it's a category-by-category architectural call. The six families below cover roughly 95% of the recommendations in the roadmaps we've shipped this year. Per family, the audit names the default pick, the cost-floor alternative, and the conditions under which we'd revisit in 12 months. We've run ai vendor selection consulting across all six categories in the last 18 months.

Frontier hosted LLMs (Claude · GPT-5 · Gemini 3)

Highest reasoning ceiling and the fastest iteration loop. Claude Opus 4.7 holds the lead on long-context analysis; GPT-5 leads on tool-call latency at scale; Gemini 3 Pro wins on 1M-token retrieval workloads. Pricing has compressed but premium tier still runs $3–15 input, $15–75 output per million tokens.

Greenfield AI roadmaps where time-to-first-value matters more than per-token cost. C-suite-visible builds where the model name is itself a signalling cost. Workloads under ~200M monthly tokens — below that, the frontier price premium is rounding error against engineering salary.

Predictable, high-volume workloads where a tuned <a href="/services/llm-development/">smaller LLM</a> beats frontier on cost by 8–20×. Strict data-residency where the provider's region map doesn't match yours. Vendor-lock anxiety where a board member has already vetoed single-source.

Audit memos almost always recommend a two-vendor posture — one frontier, one mid-tier — with a routing layer keeping migration friction near zero. Three names beats two for posture and one for cost discipline.

FrontierReasoningMulti-vendor
Self-hosted open-weight (Llama 4 · Mistral · Qwen 3)

Total control. Llama 4 405B served on H100s via vLLM hits sub-150ms p50 on most chat workloads at roughly $0.05–0.20 per million tokens amortised. Mistral and Qwen 3 cover the mid-tier; both ship instruct-tuned variants that beat year-old frontier on narrow domains after light fine-tuning.

Workloads above ~500M monthly tokens where per-token economics flip the spreadsheet. Regulated workloads where data residency or model-weight ownership is non-negotiable. Vertical workloads where fine-tuning on customer data unlocks 30–60% accuracy lift over generic frontier.

Sub-50M-token workloads — GPU amortisation kills the math. Teams without MLOps capacity — see <a href="/services/mlops/">MLOps services</a>; running vLLM in production is not a side project. Reasoning-heavy workloads where the frontier ceiling still matters more than cost.

We recommend self-host when the unit economics cross the line — usually a clear inflection rather than a gradient. Audit memo names the volume threshold and the year it'll be hit, not the year a board member wishes it would be hit.

Self-hostOpen-weightCost-floor
Vector DBs + retrieval (Pinecone · Qdrant · pgvector · Weaviate)

Retrieval is where most AI roadmaps actually live or die. Pinecone Serverless cuts ops to near zero at a premium tier. Qdrant self-hosts cleanly on Kubernetes for the team that already runs one. pgvector is the cheapest, lowest-friction choice when Postgres is already in the stack. Weaviate wins on multi-modal retrieval.

Anywhere the AI value proposition requires grounded answers — clinical, legal, regulated, internal-knowledge-rich. Almost every roadmap we ship recommends a <a href="/services/rag-development/">retrieval-augmented generation</a> pipeline as the first build, not an agent.

Pure reasoning workloads with no enterprise knowledge to ground against. Workloads where the answer is already public-internet-shaped — frontier LLM alone usually wins. Tiny corpora under 10k chunks where in-context retrieval beats a vector store.

Default recommendation: pgvector when the team already runs Postgres; Pinecone Serverless when ops bandwidth is the constraint; Qdrant when data residency requires self-host. We don't recommend Weaviate unless multi-modal retrieval is the headline requirement.

RetrievalGroundedHybrid-search
Agent + workflow frameworks (LangGraph · CrewAI · n8n · Temporal)

LangGraph is the 2026 default for state-graph agent orchestration — the only mature framework with proper state-machine semantics. CrewAI ships fastest for role-based shapes if you don't need state-graph control. n8n covers deterministic-plus-AI workflows for ops teams. Temporal is the durable-execution backbone for high-stakes long-running flows.

Multi-step agentic builds with branching state — LangGraph. Workflow automation with LLM-in-the-loop and a non-engineer ops team — n8n. Long-running orchestration with retry semantics — Temporal. We've shipped all four in production over the last 12 months.

Single-turn chatbots — none of these is the right tool; see <a href="/services/chatbot-development/">chatbot development</a>. Toy POCs — direct API calls win on velocity. AutoGen — stalled relative to LangGraph; we no longer recommend it for new builds.

Roadmap usually pairs LangGraph (agent runtime) with Temporal (durable execution) for builds where retries and human-approval gates matter. n8n shows up when the buyer is non-engineering and the workflow is more deterministic than agentic.

AgenticStatefulDurable
Observability + eval (Langfuse · Braintrust · LangSmith · Inspect)

The audit gate every roadmap we ship requires. Langfuse leads OSS observability with traces, prompt versioning, and a usable eval surface. Braintrust dominates closed-source eval workflows. LangSmith is fine if you're already inside the LangChain ecosystem. Inspect AI (UK AISI-backed) is the rigour pick for safety-critical evals.

Every roadmap recommends observability as a day-one cost line, not a phase-three nice-to-have. Most AI pilots fail because nobody knew which prompts were drifting or which tools were silently dropping — instrumentation is the cheapest insurance in the stack.

Toy projects where the eval loop is a human reading 10 outputs. We don't recommend bare logging-without-traces for anything past prototype — it's the false-economy that creates the month-six pilot stall.

Default recommendation: Langfuse self-hosted for teams that want open-source plus data control; Braintrust for teams with budget and no ops capacity. RAGAS or DeepEval as the eval harness layer regardless of trace backend. Audit memos always price observability in.

EvalTracingDay-one
Voice + multimodal stack (LiveKit · Pipecat · ElevenLabs)

LiveKit Agents and Pipecat both land sub-400ms voice turn-take in production. ElevenLabs leads on voice quality; the open-source side (Whisper Large v3, F5-TTS) is closing fast. Vision-LLMs (Claude Sonnet 4.6, GPT-5 Vision, Gemini 3 Pro) cover document understanding without a custom CV pipeline.

Voice agents — support deflection, clinical intake, sales prospecting — where the latency budget is human-conversational. Document understanding at scale where the alternative is a custom vision stack we'd route to <a href="/services/machine-learning-development/">our ML practice</a> instead.

Roadmaps where voice is a CEO whim, not a buyer journey. Vision tasks with extreme accuracy bars (defect detection, medical imaging) — frontier vision-LLM isn't the answer; a fine-tuned vision backbone is.

We've taken voice agents from POC to production three times this year. Roadmaps prescribe LiveKit + Claude Sonnet 4.6 + ElevenLabs as the default stack; cheaper open-source substitutes priced as a phase-two option.

VoiceMultimodalLatency
007 / ARCHETYPES

Four strategy archetypes. Roughly all inbound maps to one of these.

Greenfield, Modernise, RPA-Replace, and Acquisition/Board-DD cover roughly 100% of the ai consulting services engagements we've shipped over the last 18 months. Shape determines deliverable, deliverable determines pricing, pricing determines scope. We won't sell you a Greenfield engagement when you're really in Modernise — the framing call is free and we'll route you to the right shape.

01

GREENFIELD

The board has approved an AI budget for the first time. There's no incumbent AI system, no internal champion with battle scars, and no anchor use case picked. Roughly 40% of our AI consulting services engagements start here. The audit memo names the three highest-leverage use cases against capability + value scoring, sequences them, and prices the first six months in detail so finance can sign without re-reviewing.

Pick when
  • First AI budget cycle
  • No prior pilots or pilots all stalled
  • Multiple business units competing for the budget
  • CTO + CFO + COO all needed on the call
  • You've been pitched by three vendors and trust none of them
Skip when
  • Pilot already in production and earning revenue — different shape
  • Vendor already picked — go straight to RFP support
  • Pure model-routing question — that's a 1-week LLM audit instead
Stack
Capability auditUse-case scoringCosted roadmapVendor longlist
008 / BUILD VS BUY

Build vs buy AI — row-by-row on the dimensions that actually matter.

Build-vs-buy is the most-asked question in the audit room and the most-mis-answered in the slide deck. The grid below is the frame we use — nine rows the spreadsheet usually skips. Every roadmap recommendation gets graded against these rows; the call lands in writing with the dissenting view named.

Buy (vendor / SaaS) Build (custom + your engineers)
Time to first business value 6–14 weeks (custom build) 2–6 weeks (vendor pilot)
Total 24-month spend (mid use case) $280k–$650k engineering $180k–$420k licence + integration
Customisation ceiling Anything you can write; LoRA fine-tunes on your data Whatever the vendor's roadmap allows; 6–18 month wait per feature
Data residency + private deployment Self-host on Llama 4 / Mistral; full control Depends on vendor; ~half offer single-tenant; few offer self-host
Where we recommend it Differentiator workloads — the AI IS the moat Table-stakes workloads — chat, support deflection, basic RAG
The honest answer is usually both — buy for table-stakes (chat, basic RAG, support deflection), build for the differentiator workloads (the AI IS the moat). Roadmaps name the line per workload.

Where we recommend buy, we score vendors against the audit rubric and shadow demos with the buyer. Where we recommend build, we route the work to the right Paiteq technical pillar — agentic systems, retrieval pipelines, custom LLM apps, classical ML — or to a named third party where their fit is better.

009 / CRITERIA

Six vendor-evaluation criteria the procurement-grade rubric scores.

When a vendor RFP support engagement runs, the rubric is six criteria, scored 1–5, signed off by the buyer at kickoff. No vendor-flavoured spin in the criteria list; no "innovation" or "thought leadership" cells. The criteria below are the ones that actually predict whether the contract pays itself back in 24 months.

We don't take vendor kickbacks. The only money in our P&L is the consulting fee on the engagement. Where a sibling Paiteq practice could plausibly compete with a vendor we'd recommend, we disclose the conflict in writing inside the memo and recommend the option that wins on the rubric — we've recommended against ourselves three times in 2026.

010 / WHERE

Six advisory shapes across six industries — where we've shipped.

A capability-by-industry heatgrid for the ai consulting services we've actually run, not what the brochure promises. Strength reflects engagements completed; light cells are honest about depth we haven't built.

Function Industry
B2B SaaS
Fintech
Healthcare
Manufacturing
Logistics
Legal
AI Capability Audit
AI Roadmap (12-mo)
Vendor RFP Support
Build-vs-Buy Memo
AI Due Diligence
Board AI Posture
AI Capability Audit
B2B SaaSFintechHealthcareManufacturingLogisticsLegal
AI Roadmap (12-mo)
B2B SaaSFintechHealthcareManufacturingLogistics Legal
Vendor RFP Support
B2B SaaSFintechHealthcareManufacturingLegal Logistics
Build-vs-Buy Memo
B2B SaaSFintechHealthcareManufacturingLogistics Legal
AI Due Diligence
B2B SaaSFintechLogisticsLegal HealthcareManufacturing
Board AI Posture
B2B SaaSFintechHealthcareManufacturingLegal Logistics
Possible fit Good fit Primary vertical

Dark cells: 3+ engagements completed. Medium: 1–2 engagements. Light: scoped but not yet completed. Empty: not yet relevant.

011 / PROCESS

Six steps. Three weeks. One written memo.

Eval-first, baseline-anchored, ai capability assessment methodology — refined across engagements in SaaS, fintech, healthcare, manufacturing, logistics, and legal. The sequence below is the standard run; complex multi-BU engagements add a week of discovery; narrow single-use-case engagements collapse weeks 2 and 3. The ai capability assessment doubles as the procurement-gating doc when the engagement converts to RFP.

WEEK 1

Kickoff + landscape read

60-minute exec session to lock the question. Read of the current AI surface — what's in production, what's stalled, what's in vendor demos, what's in the spreadsheet. The question we're answering gets written down before we look at anything technical.

WEEK 1–2

Capability + data audit

Technical read of the existing AI surface — model choices, retrieval architecture, eval rigour, observability, MLOps posture. Data hygiene audit for the use cases on the table. Half the audits we run surface a leakage or labelling gap that has to close before any new build.

WEEK 2

Use-case scoring

Every candidate use case scored on three axes — business value, technical feasibility, organisational readiness. The scoring rubric is shared; nothing is graded on a private spreadsheet. Often the highest-value use case isn't the highest-feasibility — that's the tradeoff the memo names.

WEEK 2–3

Vendor + build path read

For the two or three use cases that survive scoring, an explicit build-vs-buy frame. Vendor shortlist scored against the same rubric the buyer will face in procurement. Build path scoped, costed, and timeline'd against named tools — Claude, GPT-5, LangGraph, Pinecone, Langfuse.

WEEK 3

Roadmap + TCO

12-month sequence with named phases, named owners, named exit gates. TCO modelled across hosted, self-hosted, and hybrid postures — we share the spreadsheet, not a sanitised summary. Sensitivity analysis on the three assumptions most likely to change.

WEEK 3

Memo + readout

20-page written memo plus 90-minute exec readout. The memo names the call, the dissenting view, and the conditions under which we'd change our mind. Board-grade artefact — most clients use it as the procurement gating doc downstream.

012 / WHY PAITEQ

Why teams pick us as their ai consulting company.

013 / SHAPES

Four ways to start an ai consulting services engagement.

The four shapes above as picker cards. Fixed-scope, fixed-fee, written deliverable. Pick the closest match — the framing call refines if needed.

014 / EVALUATED

Vendors we've evaluated in audits this year.

Frontier LLMs, agent runtimes, retrieval, observability, and voice — the surface 2026 roadmaps actually touch.

  • Claude Opus 4.7
  • GPT-5
  • Gemini 3 Pro
  • Llama 4
  • Mistral Large 3
  • LangGraph
  • CrewAI
  • Temporal
  • Pinecone
  • Qdrant
  • pgvector
  • Weaviate
  • Langfuse
  • Braintrust
  • LiveKit
  • ElevenLabs
  • Claude Opus 4.7
  • GPT-5
  • Gemini 3 Pro
  • Llama 4
  • Mistral Large 3
  • LangGraph
  • CrewAI
  • Temporal
  • Pinecone
  • Qdrant
  • pgvector
  • Weaviate
  • Langfuse
  • Braintrust
  • LiveKit
  • ElevenLabs
015 / USE CASES

Where the memos have landed.

Three anonymized engagements. Function, segment, and outcome metric are real; brand removed under NDA.

Healthcare
Multi-state payer · regulated-data shape

HIPAA-aware audit before a frontier-vendor procurement

Typical shape: a carrier has a frontier-vendor pilot in late-stage procurement and pulls us in for an independent audit. We pressure-test data residency, BAA coverage, and the deployment's ability to close HIPAA gaps inside the contract window. Where the vendor posture can't close, we re-frame the use case as a self-hosted Llama 4 + pgvector RAG build under our <a href="/services/rag-development/">retrieval-augmented generation</a> practice and re-price the roadmap against the vendor licence.

0
Deliverable: -page memo, residency gap register, re-priced roadmap
Fintech
Pre-Series-B regulated lending · EU

AI due diligence read on a credit-scoring model

Typical shape: investor diligence on a regulated-lending AI startup. We re-evaluate model claims on a leakage-free holdout, score fairness across protected slices, and write the memo named after the call — proceed, re-build, or walk. The deliverable feeds directly into the term sheet and the regulator briefing.

Deliverable: held-out eval report, fairness register, board-grade memo
Logistics
Last-mile delivery · UK + EU

RPA-replace roadmap against a renewal cliff

Typical shape: a UiPath or Blue Prism estate is approaching renewal and the AI-modernisation question lands at the wrong time. We score every bot process against a structured rubric, recommend per-process actions (migrate to <a href="/services/ai-workflow-automation/">LangGraph + Temporal workflow</a> / retain as classical RPA / retire), and sequence the migration against the renewal calendar.

0
Deliverable: scored process register, sequenced -month migration plan
016 / FAQ

What buyers ask before signing.

How is AI consulting services from Paiteq different from McKinsey, BCG, or Deloitte?

Different shape, different deliverable. Tier-one strategy houses produce slide decks; we produce written memos signed by engineers who will still be picking up the phone when the build starts. Our AI consulting services engagements run two to six weeks fixed-scope — not the six-month strategy retainers tier-one houses default to — and they end with a costed roadmap that a real engineering team can execute against, including named tools, named gates, and a TCO sensitivity analysis you can hand to procurement. We don't have a 600-person ML practice to push into the answer; that's a feature, not a bug. Where the question is genuinely about org-design across 40,000 people, McKinsey beats us. Where the question is what to actually build and which vendor to actually sign, we beat them roughly nine times out of ten.

Do you sign off on vendor selection — and how do you avoid kickback bias?

Yes. We score vendors against the same rubric the buyer will face downstream in procurement, we sit in on demos, and the memo names the call by name. We don't take referral fees from any of the vendors we evaluate — Pinecone, Anthropic, OpenAI, Google, Microsoft, LangChain, Temporal, ElevenLabs, none of them. The only money in our P&L is the consulting fee we billed you. Where a sibling Paiteq practice could plausibly compete with a vendor we'd recommend — for example our own RAG development services versus a vendor RAG product — we disclose the conflict in writing inside the memo and recommend the option that wins on the rubric anyway. We've recommended against ourselves three times in 2026.

When does it make sense to skip consulting and go straight to a build?

When the use case is clean and the vendor selection is already settled, skip us. Examples we'd genuinely route straight to a build: a single-purpose RAG over a known corpus with one buyer-approved vendor; an agent migration where the destination framework is already chosen by the engineering team; a voice agent build where LiveKit is already procured and the question is just whether to use Claude Sonnet 4.6 or GPT-5 for the brain. Where consulting actually earns its fee is when the question itself isn't yet clean — pre-AI greenfield, stalled pilot, RPA renewal pressure, vendor sprawl, or a board asking for a posture read. Buyers who think they're in the first bucket but are actually in the second usually burn $200k–$500k of engineering before realising it. The audit is cheaper.

What's in a written audit memo, and can I see a redacted one?

Twenty pages, give or take. Cover memo with the call and the three dissenting views; capability score across model selection, retrieval, eval rigour, observability, MLOps; data-hygiene findings with named leaks if any; vendor scorecard against the rubric; build-vs-buy frame with TCO modelled across three postures (hosted, self-hosted, hybrid); 12-month roadmap with named phases, named exit gates, and named tools (Claude, GPT-5, LangGraph, Pinecone, Langfuse — actual names, not categories); risk register; sensitivity analysis on the assumptions most likely to change. We can share a redacted memo under NDA — DM us through the contact form and we'll send one inside two business days. The redacted version covers a multi-state US healthcare payer engagement; brand and dollar figures removed, structure and analysis intact.

How do you price an audit, and how is that different from your AI roadmap pricing?

Fixed scope, fixed fee, both shapes. An AI Capability Audit runs two to three weeks; pricing scales with the technical surface — a single-team single-use-case audit lands at the lower end; a multi-BU multi-pilot audit at the upper. An AI Roadmap engagement runs three to five weeks; pricing scales with the number of use cases sequenced and the vendor evaluation depth. Both ship with a written memo and an exec readout. Neither runs on a time-and-materials clock — we don't sell hours; we sell a written deliverable against a fixed scope. We'll quote exact numbers after a 30-minute scoping call; the AI consulting services pricing range is on the higher end of independent advisory and the lower end of tier-one strategy houses, which is roughly where the value sits.

How is this pillar different from your generative AI consulting or LLM consulting work?

Three siblings, three different questions. AI consulting services here covers cross-cutting AI advisory — vendor selection across modalities, build-vs-buy framing across the whole AI estate, 12-month roadmaps that span retrieval and generation and agentic workloads. Generative AI consulting (the advisory wrap on our generative AI practice) is narrower — image, audio, video, brand-controlled generation; LoRA strategy; safety + watermarking posture. LLM consulting (inside our LLM development practice) is hosted-vs-self-hosted decisions, fine-tuning strategy, cost engineering on a known LLM workload. If your question is multi-modality and cross-cutting, you're in the right pillar. If it's modality-specific or model-architecture-specific, route to the sibling.

Do you stay on after the roadmap ships, or hand off cleanly?

Clean handoff is the default. Every memo names exit gates and a recommended owner per workstream — sometimes the recommendation is an internal hire, sometimes a Paiteq sibling practice, sometimes a third-party vendor. About 40% of audit engagements convert to a build engagement with us under one of the technical pillars (AI Agent Development, RAG, LLM, or Machine Learning); about 30% convert to roadmap then build; about 30% take the memo, hand it to an existing team or a competing vendor, and execute without us. We don't penalise the third path — the memo is a finished artefact in itself. Retainer engagements exist for clients who want us in the room monthly for the first year, but we don't push them by default.

018 / Start an engagement

Ship an honest AI audit in three weeks.

AI capability assessment in 2–3. Enterprise ai strategy roadmap in 3–5. AI vendor selection consulting in 4–6. AI readiness assessment + AI due diligence in 2–4.