ai governance services · live

AI governance
that ships engineering, not slide decks.

AI governance services for teams shipping LLM and agent systems. We are the engineer who readies your AI system for the audit someone else conducts. Big-4 firms write you a strategy memo, GRC platforms sell you a dashboard, lawyers write you an opinion. We ship the eval suites, audit logs, model registry, red-team findings, and remediation PRs that turn an AI governance framework (EU AI Act risk tiers · NIST AI RMF · ISO 42001) into something an auditor can actually sign off on. Responsible AI as engineering deliverable, fixed-fee discovery audit.

See the audit pipeline

Definition

What is AI governance?

AI governance is the practice of running production AI under regulator-defensible controls: a model inventory, eval framework, audit logs of every inference retained for the regimes lookback window (typically 7 years for SR 11-7-aligned bank deployments), red-team findings with remediation tracking, and a monitoring dashboard for drift, refusal rate, and groundedness. Unlike SOC 2 (which covers organizational security controls) and unlike GDPR (which covers personal data handling), AI governance specifically covers the LLM and model-decision layer. Unlike traditional model risk management for ML (one labelled dataset, one trained model), LLM governance must handle prompt-injection attack surface, retrieval drift, third-party API dependency, and probabilistic output validation. Common frameworks mapped in deliverables include SR 11-7 (US bank model risk), ISO 42001 (AI management system), the EU AI Act, NIST AI RMF, and the NAIC Model Bulletin on AI use by insurers.

Fixed-fee

fixed-fee AI governance audit · 1 week

V2 LLM systems shipped under eval-suite scope

Day 5

you get a gap report + remediation list, not a memo

Kill

walk-away point baked into every engagement

responsible ai · generative ai governance

The engineer, not the auditor.
That's the wedge.

AI governance consulting that anchors on engineering deliverables: eval suites, model cards, audit logs, red-team reports, remediation PRs. Three positioning truths up front, because you've read enough strategy memos. Most teams arrive after a Big-4 has handed them a 200-page report; we ship the engineering that turns it into a system an auditor can review.

We ship engineering, not slide decks

We are the engineer who readies your AI system for the audit someone else conducts. The Big-4 will write you a strategy memo. GRC platforms will sell you a dashboard. Lawyers will write you an opinion. We ship the eval suites, audit logs, model registry, red-team findings, and remediation PRs that turn that strategy memo into something an auditor can actually sign off on. AI governance consulting that ladders into shipped artefacts on day 5, not phase 2.

Model-agnostic responsible AI

Same governance framework whether you ship on Claude, GPT, Llama, Mistral, or a fine-tuned 7B. EU AI Act risk tiers don't care about your vendor. NIST AI RMF and ISO 42001 controls don't either. We anchor responsible AI and generative AI governance in real engineering deliverables (model cards, eval reports, prompt-injection regression suites, immutable audit logs), not vendor-locked checklists you can't take with you.

We tell you when you don't need us

Series-A LLM startup with one product surface and a single eval set? Hire a fractional GRC lead, not us. Public-sector procurement with an 18-month timeline and a board mandate? Hire a Big-4 firm that can survive the contract. You're the right fit when you're engineering-led, you've already read the strategy memo, and you need an auditor-ready system in 4–8 weeks — not a 200-slide deck.

the 1-week governance audit

What happens in five days
before the framework crosswalk lands.

The fixed-fee AI governance audit is not a discovery deck. It is a five-stage engineering pipeline with a named deliverable every day, mapped to EU AI Act / NIST AI RMF / ISO 42001 obligations as we go. End-to-end in one calendar week. Same shape we ship on every AI development engagement, scoped down to a fixed-fee gap analysis.

Day 1 · Mon

Scope & risk tier

In: System architecture · use cases · jurisdictions in scope
Out: EU AI Act risk-tier call · NIST RMF function map · scope memo

Confidence after 20%

Day 2 · Tue

Eval suite snapshot

In: Current eval set · golden cases · CI hooks
Out: Eval-coverage scorecard · accuracy + robustness baseline

Confidence after 40%

Day 3 · Wed

Audit-log & lineage review

In: Logging architecture · model registry · retention policy
Out: Audit-log gap list · lineage diagram · retention findings

Confidence after 62%

Day 4 · Thu

Red-team + prompt-injection scan

In: Production endpoints · OWASP LLM Top 10 checklist
Out: Red-team report · prompt-injection findings · severity-ranked PR list

Confidence after 84%

Day 5 · Fri

Framework crosswalk + gap report

In: All week's artefacts · framework map (EU AI Act / NIST / ISO 42001)
Out: Written gap report · 30-60-90 remediation roadmap · pilot scope doc

Confidence after 96%

ai risk management framework crosswalk

EU AI Act · NIST AI RMF · ISO 42001.
Same engineering deliverable, three labels.

The dirty secret of AI risk management is that whether your buyer asks for EU AI Act readiness, NIST AI risk management framework alignment, or ISO 42001 prep, the engineering work is largely the same: model cards, eval suite, immutable audit log, DPIA-equivalent impact assessment, incident runbook, supplier register. We ship the artefact once. We crosswalk it to whichever framework your audit committee, your regulator, or your enterprise buyer cares about.

Dimension

You're here What we ship engineering deliverable

EU AI Act Reg. (EU) 2024/1689

NIST AI RMF 1.0 Govern · Map · Measure · Manage

ISO/IEC 42001:2023 AI Management System

Risk-tier classification Where your system lands on the regulatory risk ladder.

What we ship Written risk-tier call per use case · scope memo on day 1

EU AI Act Annex III risk-tier table · Title III obligations

NIST AI RMF 1.0 Govern + Map functions · context characterisation

ISO/IEC 42001:2023 Clause 6.1 risk assessment · AI impact assessment

Model card + system card Versioned card per deployed model with intended use, eval results, limits.

What we ship Model card + system card written into the repo · versioned with each release

EU AI Act Art. 13 transparency obligations

NIST AI RMF 1.0 Map 4.1 · documentation of system characteristics

ISO/IEC 42001:2023 Annex A control A.6 · system documentation

Eval suite + benchmarks Versioned eval set with CI pass-gates on accuracy, robustness, regression.

What we ship Golden set + regression suite · pass-gate wired into CI · re-run nightly

EU AI Act Art. 15 accuracy + robustness obligations

NIST AI RMF 1.0 Measure 2.1–2.13 · accuracy, reliability, robustness

ISO/IEC 42001:2023 Clause 9.1 · monitoring + measurement of AI performance

Audit log (immutable) Append-only trace of prompts, tool calls, decisions, user context.

What we ship Append-only store · manifest hashes · 12-month retention default

EU AI Act Art. 12 automatic logging obligation

NIST AI RMF 1.0 Manage 4.1 · logged decisions for monitoring

ISO/IEC 42001:2023 Annex A control A.8 · operational data + logging

Red-team + adversarial testing Prompt-injection, data exfil, jailbreak, OWASP LLM Top 10 coverage.

What we ship 40+ prompt-injection cases on every agent · nightly run · written findings

EU AI Act Art. 15 cybersecurity + accuracy obligation

NIST AI RMF 1.0 Measure 2.7 · security + resilience

ISO/IEC 42001:2023 Annex A control A.6.2.4 · AI system testing

Incident-response playbook Named owner, severity tiers, rollback path, reporting clock.

What we ship Runbook in the repo · rollback path tested · severity matrix

EU AI Act Art. 62 serious-incident reporting (high-risk systems)

NIST AI RMF 1.0 Manage 4.2 · documented incident response

ISO/IEC 42001:2023 Clause 10.2 · nonconformity + corrective action

Conformity / DPIA-equivalent Written impact assessment + sign-off artefact for buyers + auditors.

What we ship DPIA-shaped impact assessment · written, versioned, internally signed off

EU AI Act Art. 47 conformity declaration (we prep · you sign · certified body assesses)

NIST AI RMF 1.0 Govern 1.6 · documented oversight + accountability

ISO/IEC 42001:2023 Clause 8 · operational planning + control

Acceptable-use policy Written AUP for end-users + internal operators + third-party callers.

What we ship AUP drafted · plain-English version published · operator version signed

EU AI Act Art. 4a AI literacy + intended-use obligations

NIST AI RMF 1.0 Govern 4.1 · roles + responsibilities

ISO/IEC 42001:2023 Annex A control A.5 · AI policies

Vendor / supply-chain risk Inventory of third-party models, datasets, plugins, with contract gaps flagged.

What we ship Third-party AI register · BAA / DPA gap list · model-card review

EU AI Act Art. 25 importer / distributor obligations

NIST AI RMF 1.0 Govern 6.1 · third-party risk + supply chain

ISO/IEC 42001:2023 Annex A control A.10 · supplier relationships

Article and clause references current as of 2026. EU AI Act = Regulation (EU) 2024/1689. NIST AI RMF = AI Risk Management Framework 1.0 (Jan 2023). ISO/IEC 42001:2023 = Artificial Intelligence Management System.

ai compliance · ai policy · ai acceptable use policy

The compliance work
most teams quietly skip.

AI compliance is where shipped systems die quietly: a policy in a Notion page nobody reviews, a vendor register that's six months old, an incident runbook nobody has tested. We don't ship governance theatre. Three artefacts, written into the repo, versioned with your releases, routed through legal and security review. AI policy that survives an audit committee meeting, not one that lives on a shared drive.

AI policy + acceptable-use policy

Two policies, not one. The AI policy is internal (what teams can ship, where the kill-switches are, who signs off on a model swap). The AI acceptable-use policy is external (what your end users and third-party API callers may do with your system). We draft both, route them through your legal review, and keep them versioned with your repo. Standard AI compliance gap on most teams: a policy that lives in a Notion page, never reviewed, contradicted by what the engineering team actually ships.

AI vendor risk assessment

Most LLM systems run on three to seven third-party AI vendors — base model, embedding model, vector DB, observability, eval platform, sometimes a moderation API. We inventory every external AI call, score each vendor on BAA / DPA posture, retention defaults, sub-processor list, and named-jurisdiction processing, and ship a written register. The output is the supplier annex an auditor will ask for, written in plain English, not a 40-page risk matrix nobody reads.

AI incident-response + reporting clock

EU AI Act Article 62 sets a 15-day clock for serious-incident reporting on high-risk systems. NIST AI RMF wants a documented incident playbook. ISO 42001 Clause 10.2 wants corrective-action records. We write a runbook with named owners, severity tiers, a tested rollback path, and a reporting-clock chart that maps your jurisdiction to the right notification window. Stored in the repo, not in a binder.

ai audit services · what week 1 ships

Five days, four engineering deliverables,
one walk-away point.

The audit is the entry point for every AI governance services engagement. Same shape, same kill-point logic, same Friday written deliverable as every other audit we ship across the practice. An AI compliance audit that ends with engineering work to do, not a slide deck and a re-engagement quote.

Day 1–2

Scope + eval snapshot

Workshop on architecture, jurisdictions, named use cases. We pull a snapshot of your current eval set and CI hooks. The day-2 deliverable is a risk-tier call (EU AI Act framing) and an eval-coverage scorecard with named gaps. This is the cheapest week to discover you're a Limited-risk system, not a High-risk one — saves you eight figures in unnecessary controls.

Risk-tier call · eval scorecard · scope memo
Day 3

Audit-log + lineage review

We read your logging architecture, your model registry, your retention policy. The deliverable is a gap list — what's missing for EU AI Act Article 12 logging, NIST AI RMF Manage 4.1, ISO 42001 Annex A.8. If the gap is small, we recommend a remediation PR on day 5. If the logging architecture is wrong at the architecture layer, this is the day we tell you the audit recommends a re-architecture before the pilot — and we walk away from selling you a pilot we can't ship.

Audit-log gap list · lineage diagram · retention findings

Walk-away point
Day 4

Red-team + prompt-injection

OWASP LLM Top 10 ran against your production endpoints (or staging if production isn't on the table). We add 40+ prompt-injection test cases — direct, indirect, multi-turn, payload-in-document, tool-output-poisoning — and document findings with severity, repro steps, and proposed remediation PR. Standard scope on every chatbot + agent we ship; here we run it as an independent assessment, not a self-audit.

Red-team report · prompt-injection findings · ranked PR list
Day 5

Crosswalk + gap report

Friday output: a written gap report that maps every finding from the week onto EU AI Act articles, NIST AI RMF subcategories, and ISO 42001 Annex A controls. Plus a 30-60-90 remediation roadmap with named PRs, cost bands, and a kill-point on each remediation phase. The same artefact you'd hand to a board AI committee, a regulator's pre-engagement query, or a Big-4 firm running the certification step.

Gap report · framework crosswalk · 30-60-90 remediation roadmap

ai red team · llm red teaming · ai security assessment

LLM red teaming
that ships nightly, not annually.

AI red team work that stops at a yearly engagement misses the new exploits that land between cycles. We ship prompt-injection testing wired to nightly eval, mapped to the OWASP LLM Top 10 (2025), with every successful bypass turned into a regression case the next morning. AI security audit findings come with merged remediation PRs, not a 60-page PDF. Standard scope on every chatbot and agent we ship — available as an independent ai security assessment on systems we didn't build.

Prompt injection (direct + indirect)

OWASP LLM01. Direct prompt injection in user input · indirect via tool outputs, RAG documents, retrieved emails. Our regression suite ships 40+ payloads on every agent — system-prompt leak, persona-override, instruction-hijack, payload-in-PDF, payload-in-support-ticket. Standard scope on every agent we ship; available as an independent prompt-injection testing engagement on systems you didn't build with us.

Data exfil + sensitive-information disclosure

OWASP LLM02 + LLM06. We test for system-prompt leakage, retrieval of training-data fragments, PII regurgitation, secret extraction (API keys, internal hostnames, customer identifiers). Mapped to a golden adversarial set with each payload tagged by severity and exfil-class. Output is a findings document with severity-ranked remediation PRs, not a generic LLM security assessment slide.

Jailbreak + multi-turn manipulation

Multi-turn social-engineering, persona stacking, hypothetical framing, low-resource-language pivots, base64 / homoglyph encoding. We run these against your refusal policy and document each successful bypass with the exact turn sequence. AI security audit findings come with a regression case added to the nightly eval — so the same exploit can't ship past you twice.

OWASP LLM Top 10 + threat model

Full OWASP LLM Top 10 (2025) coverage — prompt injection, sensitive disclosure, supply-chain, data + model poisoning, improper output handling, excessive agency, system-prompt leakage, vector + embedding weaknesses, misinformation, unbounded consumption. Ships with an AI threat model diagram tailored to your architecture: trust boundaries, attacker classes, asset inventory, attack paths.

ai governance framework · readiness scorecard

Ten flags that separate
an auditor-ready system from a memo.

The same checklist we run on day 1 of every AI governance audit. If your system clears all ten green flags, you're already at responsible AI framework parity — call us when you want a second opinion. If three or more land in the red, the discovery audit will pay for itself by surfacing the gaps before a regulator, an enterprise buyer, or an algorithmic audit firm does.

your vendor scorecard

0/10 keep looking

tap pass / fail on each criterion · saved locally in your browser

01
Model card present

Versioned model card + system card in the repo. Names intended use, eval results, known limits, retraining cadence, and named owner.

"It's in a Notion page somewhere." No version history. No named owner. Last updated before the current model swap.
02
Eval suite versioned

Golden eval set checked into the repo. CI pass-gate fires on every PR. Regression cases added whenever an exploit lands.

Three prompts in a Jupyter notebook. "We'll formalise the evals once we ship." No regression cases for known failures.
03
Audit log retained

Append-only store with manifest hashes. Prompt + tool-call + decision + user-context captured. Retention policy written, default 12 months.

CloudWatch logs at default retention. No tool-call trace. No way to reconstruct what a given user actually saw.
04
DPIA / impact assessment done

Written DPIA-shaped impact assessment per system. Signed by named owner. Re-reviewed on every material change. Lives in the repo, not a shared drive.

"Legal said it's fine." No artefact. Re-review happens only when a regulator asks.
05
Risk tier classified

Each use case mapped to EU AI Act risk tier (Unacceptable / High / Limited / Minimal) with reasoning. NIST AI RMF Map function complete.

"We think it's Limited." No written reasoning. Different teams classify the same system differently.
06
Acceptable-use policy published

AUP versioned and published. Internal operator version separately signed. Routed through legal review on each release.

Generic LLM disclaimer in the footer. No operator-facing version. Never reviewed by legal.
07
Vendor risk assessed

Inventory of every third-party AI call. BAA / DPA on file for each vendor that touches sensitive data. Retention + sub-processor list reviewed quarterly.

"We use OpenAI." No idea how many vendors are in the pipeline. Retention defaults never reviewed. No BAA on PHI-touching paths.
08
Red-team report dated < 90 days

Independent red-team run within the last 90 days. Findings tracked to remediation PRs. Nightly prompt-injection regression on top.

Single red-team done at launch. Findings live in a slide deck. No nightly regression. New exploits land in production first.
09
Incident playbook tested

Runbook with named owner, severity tiers, rollback path. Drill run at least every six months. Reporting-clock chart maps jurisdiction to notification window.

"We'll figure it out if it happens." No runbook. Rollback path untested. No idea what Article 62 says.
10
Rollback path tested

Feature flag + model-pin + previous-version artefact retained. One-command rollback. Tested in the last quarter on a non-prod scenario.

Rollback means "redeploy from git." No feature flag. Previous model artefact already garbage-collected.

Score your own system. Saved locally in your browser. Run it past your AI security audit lead, your DPO, your engineering lead, and your CISO separately — disagreement between them is itself a finding.

ai governance platform · vs · ai governance engineering service

AI Governance Platform vs Engineering Service.

Buyers who search "ai governance platform" usually want a SaaS dashboard. Buyers who search "ai governance services" usually want an engineering team. The two solve different shapes of the same problem, and most regulated enterprises end up needing both.

An AI governance platform (Credo AI, OneTrust AI Governance, Holistic AI) is a control plane: a model registry, a policy library, a risk dashboard, a multi-stakeholder workflow. It is the right buy when you have 30+ AI systems across the org, a staffed GRC function, and a board that wants one screen to look at. The platform itself does not ship eval suites, write audit-log code, or merge remediation PRs. It tracks the artefacts your engineering teams produce. Annual platform fee usually $25K–$100K, plus implementation services on top.

We are the engineering service that produces the artefacts a platform would track. The eval suite checked into your repo. The append-only audit log behind your agent. The model cards versioned with each release. The prompt-injection regression that runs nightly. The vendor register with the BAA gaps named. If you already own a platform, we plug into it. If you don't, the artefacts we ship are portable — they live in your repo, not in our tooling. Most teams under 10 AI systems do not yet need a platform. They need the engineering done.

honest scope · referral list

When you should not hire us for AI governance.

Three engagements where we openly recommend a different firm. We would rather refer you out than take a contract we cannot ship.

If you need ISO certification, hire an accredited auditor. We do not certify.
If your bias audit must meet regulator standard, hire a specialist statistician. We ship the eval framework, not the certification.
If your scope is pure-policy without engineering, hire McKinsey or a Big-4. We live in the code layer.

how we compare · respectfully

Big-4, GRC platform, specialist auditor,
in-house GRC — or us.

Five real options for an AI governance engagement in 2026. The Big-4 (KPMG, EY, PwC, Deloitte) ship the strategy memo and the board narrative. GRC platforms (Credo AI, OneTrust, Holistic AI) ship the dashboard and the policy templates. Specialist auditors (babl.ai, ORCAA) ship the independent third-party audit. In-house GRC teams ship continuous oversight once they're staffed. We ship the engineering: eval suite, audit log, model registry, remediation PRs. None of these are wrong; they solve different shapes of the same need. Here's when each one wins, and where we openly recommend you hire somebody else.

Dimension

You're here GetWidget Fixed-fee · 1 week · engineering-led

Big-4 advisory KPMG · EY · PwC · Deloitte

GRC platform Credo AI · OneTrust · Holistic AI

Specialist auditor babl.ai · ORCAA

In-house GRC Hire 3+ governance FTEs

Deliverable What sits on your shared drive on Friday.

GetWidget Eval suite + audit log + model registry + remediation PRs

Big-4 advisory Strategy memo + framework map · you implement

GRC platform Dashboard + policy templates · you implement controls

Specialist auditor Independent third-party audit report

In-house GRC Internal policy + reviews · capacity-bound

Pricing band Realistic engagement spend over the first year.

GetWidget discovery audit · fixed-bid pilot · monthly continuous

Big-4 advisory $150K–$500K · partner-led billing

GRC platform $25K–$100K/yr platform fee · implementation extra

Specialist auditor $25K–$80K per audit · independent of remediation

In-house GRC $200K+/yr loaded cost per FTE · 3+ FTEs to staff

Timeline to first artefact How long before you have a defensible written deliverable.

GetWidget 1-week audit · pilot 4–8 weeks

Big-4 advisory 8–16 weeks · diagnostic phase first

GRC platform 2–6 weeks platform setup · longer for full coverage

Specialist auditor 6–12 weeks per audit cycle

In-house GRC Ongoing · 6+ months to staff up

Engineering work shipped Does the engagement leave behind code, scaffolding, PRs?

GetWidget Yes — eval suite, audit log, model card scaffolding, remediation PRs

Big-4 advisory No — implementation handed to delivery partners or you

GRC platform Platform integrations · you configure + connect

Specialist auditor No — audit findings only, remediation is your problem

In-house GRC Limited · depends on FTE engineering depth

Best for Where each option wins clearly.

GetWidget Engineering-led teams needing auditor-ready output in < 8 weeks

Big-4 advisory Public-sector procurement · board reporting · partner-led narrative

GRC platform Multi-system GRC at scale · regulated industries with platform budget

Specialist auditor Independent third-party audit you can hand to a regulator

In-house GRC Series B+ with steady-state governance load · 3+ FTEs justified

Walk away from us if Where we openly recommend you go elsewhere.

GetWidget You need a board-grade strategy memo or a regulator-facing audit

Big-4 advisory You need engineering shipped this quarter, not slides

GRC platform You don't have platform budget · single-product surface

Specialist auditor You need remediation, not just findings

In-house GRC You can't credibly hire 3+ governance FTEs

We respect the Big-4 advisory practices, the GRC platforms, and the specialist algorithmic auditors. We work alongside them — typically as the engineering implementer between a Big-4 strategy phase and a babl.ai or ORCAA third-party audit. We don't replace any of them. Pricing bands reflect public partner-led and platform-tier ranges; your mileage varies by scope and jurisdiction.

Not sure which model fits?

Twenty-minute fit call. We will tell you when your problem is squarely in the Big-4 lane, when you need a GRC platform, or when a babl.ai-grade independent audit is the right next step. No deck, no obligation.

Book a fit call Read the FAQ

capability patterns

Three governance engagements,
three different shipped artefacts.

Anonymized capability patterns from real shipped engagements. The point of each one is what we shipped — versioned eval suites, immutable audit logs, model cards, de-id pipelines, prompt-injection regression. Named references shared under NDA. (For broader patterns from regulated industries, see our healthcare AI development capability sets.)

Multi-specialty clinic chain Pattern

Eval suite + PHI-scrub gates for a patient-intake chatbot

Problem

Clinic group shipped an LLM patient-intake chatbot in pilot and needed defensible HIPAA + clinical-safety evidence before the regional ops team would approve a wider rollout. They had a model card sketch and a hand-curated test set; nothing wired to CI.

Approach

Built a versioned eval suite on a 200-row golden set with PHI-scrub gates (regex + LLM-based de-id) and prompt-injection regression. Wired the eval pass-gate into CI; failed builds block deploy. Audit log moved to an append-only Postgres table with manifest hashing; retention written into the runbook. Model card formalised and checked into the repo.

Sonnet 4.6Custom eval harnessPostgres audit logBAA with vendor

Outcome

94% PHI-scrub recall on a 200-row golden eval set, healthcare intake (2026-Q1); every release now produces eval-pass + audit log

Enterprise SaaS · customer-service agent Pattern

Prompt-injection regression after a pilot jailbreak

Problem

Customer-service agent for an enterprise SaaS shipped to pilot and was jailbroken via a payload embedded in a support ticket. The exploit reached production traffic before the team caught it. Pilot was paused; board wanted a re-launch plan in two weeks.

Approach

Mapped the system to OWASP LLM Top 10. Wrote 40+ prompt-injection test cases covering direct, indirect (ticket payload), tool-output poisoning, and multi-turn jailbreak. Wired them to a nightly eval run. Instrumented the audit log with full prompt + tool-call trace. Ran a red-team in week 4 of the pilot resume.

OpenAI Agents SDKLangSmith evalPostgres trace storeGitHub Actions nightly

Outcome

40+ prompt-injection regression cases shipped to nightly CI, SaaS agent (2026-Q1); 7 PRs merged before production cutover, zero repeat exploits in 90 days

Regional health payer Pattern

Audit log + de-id pipeline for a prior-auth pilot

Problem

Regional payer ran a prior-authorization automation pilot and needed BAA-compliant logging plus a de-identification pipeline before internal compliance would approve production. Pilot had been running on default cloud-vendor logging; nothing usable for audit.

Approach

De-id helpers built (regex + LLM-based PII scrub on inbound clinical text). Immutable audit log moved to append-only S3 with a hashed manifest per write; archived to Glacier on a 90-day cadence. Model-card scaffolding written for the internal governance review committee. Cost-floor model swap (Sonnet 4.6 for adjudication, Haiku 4.5 for triage) shipped with the same audit-log instrumentation.

Sonnet 4.6 + Haiku 4.5Custom de-idS3 append-only + GlacierModel-card template

Outcome

90-day audit-log retention to append-only S3 + Glacier archive, payer prior-auth (2026-Q1); passed internal compliance review, engagement extended to continuous

engagement models

Three engagement tiers.
Audit, pilot, continuous.

Same pricing as every other AI services pillar we run. The audit is the entry point; about half of governance engagements ladder into a remediation pilot inside the quarter, and continuous engagements typically pick up the next two or three use cases on the roadmap. No annual contracts.

Most teams start here

1 week

AI governance audit

Five-day audit that ends with a written gap report mapped to EU AI Act + NIST AI RMF + ISO 42001 — not a slide deck.

Fixed-fee fixed

Risk-tier classification + scope memo
Eval-coverage scorecard + audit-log gap list
Red-team + prompt-injection findings
Framework crosswalk (EU AI Act / NIST / ISO 42001)
30-60-90 remediation roadmap + pilot scope doc

4–8 weeks

Remediation pilot

Ship the audit's top remediation block. Eval suite hardened, audit log retained, model card written, red-team regression in CI.

Fixed-bid fixed price

Eval suite + CI pass-gate wired in
Audit-log + lineage + retention shipped
Model card + system card · DPIA-shaped impact assessment
Red-team regression nightly · remediation PRs merged
Walk-away point — if the remediation won't close the gap, no phase 2

Monthly

Continuous governance

Embedded governance engineering on the rest of the roadmap — new use cases, new vendors, new jurisdictions, new exploits.

monthly per month

Quarterly framework-refresh + risk-tier re-review
Monthly red-team + eval-drift report
Vendor / supply-chain register kept current
Incident-response on call · cancel any month

Talk to us

Engineering deliverables, not slide decks Model-agnostic — your stack, your eval set Walk-away point written into every phase Bengaluru-based · remote-first · honest about what we can't claim

Ready to ship

Hire an AI governance team
that ships the artefact.

Book the fixed-fee AI governance audit. Five days, four written engineering deliverables, one framework crosswalk, one 30-60-90 remediation roadmap. We map your system to EU AI Act risk tiers, NIST AI RMF functions, and ISO 42001 controls; we run an OWASP LLM Top 10 red-team; we leave behind a gap report your audit committee or your next enterprise buyer can read.

Talk to our team

1 calendar week · fixed fee Written gap report · not a deck 30-60-90 remediation roadmap included

iso 42001 certification · nist ai rmf · eu ai act · algorithmic audit

Questions governance buyers
ask before they book.

What is an AI governance framework and which one should we use?

An AI governance framework is a written set of risk controls, deliverables, and review gates that an AI system is held to over its lifecycle. The three that matter for most teams in 2026 are the EU AI Act (binding regulation across the EU, risk-tier obligations escalating through Limited / High / Unacceptable categories), NIST AI RMF 1.0 (the US voluntary framework — Govern / Map / Measure / Manage functions), and ISO/IEC 42001:2023 (the international standard, AI management system, certifiable). For most shipped AI systems we audit, the answer is "all three at once," because the engineering artefacts (model card, eval suite, audit log, DPIA-equivalent, incident runbook) are largely the same — only the framework labels differ. We write the artefact once and crosswalk it to whichever framework your buyer or regulator cares about. Most engagements start with a governance audit via consulting — a 1-2 week assessment that maps your AI inventory against EU AI Act, NIST RMF, and ISO 42001 obligations before any remediation. Verticals with high governance load: education data governance (FERPA + COPPA) in addition to the standard healthcare / fintech / insurance / legal set.

Will you get our AI system ISO 42001 certified?

No — and we say that openly. We are not a certified ISO 42001 audit body and we do not hold ISO 42001 lead-auditor credentials. What we do is the engineering preparation: we ship the AI management system artefacts a certified auditor will ask for (risk assessment, model cards, eval suite, audit log, incident-response runbook, supplier register, internal audit programme). When you're ready, a certified body (BSI, TÜV, DNV, others) runs the actual ISO 42001 certification audit. The same applies to SOC 2, ISO 27001, and HITRUST — we prep the system for the audit; someone else conducts it. If you need a one-stop firm that does both prep and certification, hire a Big-4. We're cheaper, faster, and we ship code — but we're not the auditor.

What does an AI audit actually deliver in week 1?

Five artefacts. (1) A risk-tier call per use case, mapped to EU AI Act categories with written reasoning. (2) An eval-coverage scorecard naming what's covered, what's missing, and what regression cases would close the gap. (3) An audit-log + lineage gap list against Article 12 / NIST Manage 4.1 / ISO Annex A.8. (4) A red-team + prompt-injection report covering OWASP LLM Top 10 with severity-ranked findings. (5) A framework crosswalk + 30-60-90 remediation roadmap that names PRs, cost bands, and a kill-point on each phase. AI audit services that stop at a slide deck are the marketing version; the AI compliance audit we run leaves you with a document an engineering team can ship from.

How is AI red teaming different from a regular pen-test?

A regular pen-test exercises the network, the API, the auth layer — classic infosec. AI red teaming (LLM red teaming specifically) exercises the model layer: prompt injection (direct + indirect), data exfil + sensitive-information disclosure, jailbreak chains, tool-output poisoning, system-prompt leakage, supply-chain attacks on embedding models and training data. The OWASP LLM Top 10 (2025) is the standard reference. We run both — the network pen-test boundary is well-served by traditional infosec firms, but they don't ship prompt-injection regression cases into a nightly eval run. We do. AI security assessment work without an LLM-layer red-team isn't an AI security audit; it's a network audit with an AI label.

Do you cover the EU AI Act risk-tier obligations?

Yes — as an applied framework we map your system to, not as a certification we issue. We classify each use case (Unacceptable / High / Limited / Minimal under Annex III) with written reasoning, then walk you through the Title III obligations the classification triggers: Article 12 logging, Article 13 transparency, Article 15 accuracy + robustness + cybersecurity, Article 25 importer / distributor responsibilities, Article 47 conformity declaration, Article 62 serious-incident reporting. The output is an engineering checklist plus a written declaration template. The certified conformity assessment (for High-risk systems on the standalone list) is then run by a notified body — not us. EU AI Act compliance prep, not EU AI Act certification.

What's the difference between you and Credo AI / OneTrust / Holistic AI?

Credo AI, OneTrust, and Holistic AI are AI governance platforms — SaaS dashboards that help large GRC teams inventory AI systems, run policy templates, and track risk across many models. They're well-suited for regulated enterprises with a multi-million-dollar GRC budget and a platform-ops team to configure them. We're not a platform. We're the engineering service that ships the underlying artefacts a platform would track — eval suites, audit logs, model cards, remediation PRs. Many teams use both: the platform for board-level reporting, us for the engineering work the platform needs to point at. We don't compete with them; we operationalize them. When the platform fit isn't right, we're the lighter-weight alternative — same engineering output, no annual platform fee.

How do you handle third-party / vendor AI risk?

We start with an inventory of every external AI call your system makes — base model, embedding model, vector DB, observability, eval platform, moderation API, any plug-in or function-calling target. For each vendor we score BAA / DPA posture, retention defaults, sub-processor list, named-jurisdiction processing, and model-card transparency. The output is a supplier register an auditor can read and a gap list of contracts to renegotiate. AI vendor risk assessment work that stops at "we use OpenAI" misses the four to six other vendors typically in the pipeline; we surface them all.

Can you audit a system you didn't build?

Yes — about half of our governance engagements are on systems built by other teams or vendors. What we ask for: a system architecture sketch (we can build one with you in a 90-min session if none exists), read-access to the eval set and logging architecture, named owners for the use cases in scope, and a willing engineering counterpart to run remediation PRs with us afterwards. AI security audit work without engineering remediation rarely closes the gap, so we prefer engagements where the remediation pilot can ladder out of the audit — but we'll run the audit standalone if you just need the gap report for a board or a regulator's pre-engagement query.

Do you do bias audits at academic / regulator standard?

No — and we recommend specialists when that's the bar. We ship operator-grade AI bias audit work: subgroup-performance breakdowns on a versioned eval set, disparate-impact metrics on classification systems, named limits in the model card. That's useful for AI compliance review and for surfacing issues before production. It is not equivalent to the peer-reviewed statistical work done by academic groups or specialist algorithmic auditors like babl.ai or ORCAA, who hold credentials and methodology we don't. When the engagement requires regulator-facing or peer-review-grade bias evidence, we refer to those specialists openly. Responsible AI work means knowing what you're qualified to ship and what you aren't.

keep exploring

Related pages.
Pick where you are.

AI governance work usually ladders into a Claude or OpenAI engineering engagement, sits alongside an industry-specific pillar like healthcare or legal, or pairs with an integration build. These pages go deeper on each lane the audit might land on.

01 Service

AI governance that ships engineering, not slide decks.

What is AI governance?

The engineer, not the auditor. That's the wedge.

We ship engineering, not slide decks

Model-agnostic responsible AI

We tell you when you don't need us

What happens in five days before the framework crosswalk lands.

Scope & risk tier

Eval suite snapshot

Audit-log & lineage review

Red-team + prompt-injection scan

Framework crosswalk + gap report

EU AI Act · NIST AI RMF · ISO 42001. Same engineering deliverable, three labels.

The compliance work most teams quietly skip.

AI policy + acceptable-use policy

AI vendor risk assessment

AI incident-response + reporting clock

Five days, four engineering deliverables, one walk-away point.

Scope + eval snapshot

Audit-log + lineage review

Red-team + prompt-injection

Crosswalk + gap report

LLM red teaming that ships nightly, not annually.

Prompt injection (direct + indirect)

Data exfil + sensitive-information disclosure

Jailbreak + multi-turn manipulation

OWASP LLM Top 10 + threat model

Ten flags that separate an auditor-ready system from a memo.

AI Governance Platform vs Engineering Service.

When you should not hire us for AI governance.

Big-4, GRC platform, specialist auditor, in-house GRC — or us.

Not sure which model fits?

Three governance engagements, three different shipped artefacts.

Eval suite + PHI-scrub gates for a patient-intake chatbot

Prompt-injection regression after a pilot jailbreak

Audit log + de-id pipeline for a prior-auth pilot

Three engagement tiers. Audit, pilot, continuous.

AI governance audit

Remediation pilot

Continuous governance

Hire an AI governance team that ships the artefact.

Questions governance buyers ask before they book.

Related pages. Pick where you are.

AI Consulting Company

AI Development Company

AI Agent Development

AI Integration Services

Healthcare AI Development

Clinical Triage RAG

Legal Contract Review RAG

AI for Manufacturing

AI Automation Agency

Weekly eval gates

RAG vs fine-tuning decision tool

AI governance
that ships engineering, not slide decks.

The engineer, not the auditor.
That's the wedge.

What happens in five days
before the framework crosswalk lands.

EU AI Act · NIST AI RMF · ISO 42001.
Same engineering deliverable, three labels.

The compliance work
most teams quietly skip.

Five days, four engineering deliverables,
one walk-away point.

LLM red teaming
that ships nightly, not annually.

Ten flags that separate
an auditor-ready system from a memo.

Big-4, GRC platform, specialist auditor,
in-house GRC — or us.

Three governance engagements,
three different shipped artefacts.

Three engagement tiers.
Audit, pilot, continuous.

Hire an AI governance team
that ships the artefact.

Questions governance buyers
ask before they book.

Related pages.
Pick where you are.