Generative AI Development Services

Generative AI development services — brand-controlled image, video, audio at scale.

Paiteq is a generative ai development services and generative ai consulting agency. We ship production pipelines on Flux, SDXL, Sora, Runway, ElevenLabs and Cartesia — brand-controlled with LoRA training, eval-graded, with provenance and human-review gates built in. Not a vendor of a single house model; we pick the stack against your output spec, your data rules, and your unit economics.

Modalities Image · Video · Audio · Multimodal
Stack Flux · SDXL · Sora · Runway · ElevenLabs
Engage Pilot · Build · Brand-LoRA · Advisory
Provenance C2PA + watermark
001 / OUTCOMES

Four numbers that travel with every generation pipeline.

The eval rubric is what separates a generative ai company from a prompt-engineering hobbyist. We grade every pipeline on brand-fidelity, safety, cost, and latency — measured weekly, published to the client in a shared dashboard. These four numbers determine whether a pipeline ships and whether it stays live.

Numbers above are medians across shipped engagements on Flux, Stable Diffusion 3, Sora, Runway, ElevenLabs, and Cartesia for clients in e-commerce, ed-tech, fintech, and B2B SaaS. Spread by surface type is wide — we model your specific workload at discovery rather than quote an average.

002 / CAPABILITIES

Six generation capabilities across six modalities.

A capability-by-modality heatgrid showing where Paiteq's generative ai services have shipped at production scale. Strength scores reflect what we've taken to production, not what we've experimented with — the gaps are honest.

Function Industry
Image
Video
Audio
3D / depth
Vision-LLM
Brand LoRA
Marketing / brand assets
Product photography
Personalised video
Voice / narration
Synthetic training data
Multimodal apps
Marketing / brand assets
ImageVideoAudioVision-LLMBrand LoRA 3D / depth
Product photography
Image3D / depthVision-LLMBrand LoRA VideoAudio
Personalised video
ImageVideoAudioBrand LoRA 3D / depthVision-LLM
Voice / narration
AudioBrand LoRA ImageVideo3D / depthVision-LLM
Synthetic training data
ImageVideoAudio3D / depthVision-LLM Brand LoRA
Multimodal apps
ImageVideoAudio3D / depthVision-LLM Brand LoRA
Possible fit Good fit Primary vertical

Dark cells: shipped at production scale. Medium: shipped in pilot. Light: experimented but not yet production. The empty cells are real — we don't claim depth we haven't built.

003 / SERVICES

Four engagement shapes, fixed-scope.

Pilot, full Build, Brand-LoRA training, or Advisory. Every generative ai development services engagement maps to one of the four. Mixed engagements are billed as two consecutive shapes, not as a single open-ended retainer.

003B / ADVISORY

Generative AI consulting — when the question is build-vs-buy, not how-to-build.

A separate shape from the Build engagements. Generative ai consulting is a 2–3 week intervention where the deliverable is a costed decision memo, not a prototype. We field about a third of our inbound for this — most often from a CTO or CMO who has an internal team running prompt-only pipelines and needs an outside read before committing to a Flux self-hosting decision or a Brand-LoRA programme.

01

Model selection

Flux vs Stable Diffusion 3 vs Imagen 4 for image; Sora vs Veo vs Runway for video; ElevenLabs vs Cartesia for audio. We bring our internal benchmark numbers and your eval scenarios into one room and pick.

02

Build-vs-API economics

When does the GPU bill on self-hosted SDXL beat the per-asset bill on Replicate Flux Pro at your projected volume? Answer depends on idle time, batch size, and cache hit rate — most internal teams don't have the data to model it correctly.

03

Compliance & IP review

C2PA provenance, watermarking, EU AI Act high-risk classification (phasing in through 2025–2027), HIPAA-aligned data handling where healthcare data is in the loop, and a clean read on what the model providers actually allow under commercial terms.

04

Generative AI strategy memo

A costed roadmap that maps generation surfaces against your business model with priority order, budget, and timeline. This is the deliverable a board pack is built from.

Not a prototype, not a workshop, not a discovery sprint that produces nothing. Three weeks in you have the memo, the numbers, the model picks, the compliance assessment, the costed plan. If you build it, we roll the consulting fee into the Production Build. If your team builds it with our patterns, you take the memo and run.

004 / PATTERNS

Four deployment patterns. We pick on cost, control, and compliance.

The decision between hosted-API, brand-LoRA, hybrid, and fully-self-hosted isn't religious — it's economics plus regulation plus brand control. About 60% of new engagements start as API-first; a quarter graduate to Brand-LoRA in Production Build; a tenth end up Hybrid or Self-hosted because volume or compliance forced it.

01

API-FIRST

The fastest path to a working generation pipeline. Calls Flux Pro, Imagen, or DALL-E behind a thin service. Brand consistency comes from a constrained prompt library and reference images, not a custom model. A human-approval queue gates publish on customer-facing surfaces. Where we start about 60% of new engagements.

Pick when
  • Setup measured in days, not weeks
  • Editorial / blog illustration where brand drift is recoverable
  • Pilot phase to validate output spec before committing to a LoRA
  • Workloads under 10,000 assets/month where hosted economics win
Skip when
  • Regulated data — outputs leave your perimeter
  • Product photography where prompt-only fails brand QA
  • High-volume workloads where the per-asset bill flips against self-hosted
  • Markets needing tight ControlNet conditioning
Stack
Flux ProDALL-E 3Imagen
004B / BRAND

Brand-controlled generation — the LoRA approach we ship for enterprise.

Generic image models produce generic outputs. That works for editorial illustration. It fails fast on product photography, hero campaigns, or any surface where a customer can tell that the brand is shaped by the model and not the other way around. Brand-controlled generation is how we close that gap — and it's a genuine technical gap in the SERP. None of the top-ranking generative ai agency pages cover the methodology at this depth, because most haven't shipped enough LoRAs to have one.

  1. 01 Asset curation
    200–1,000

    Sit with the design lead, pull 200–1,000 brand-approved examples that represent the visual identity — composition, lighting, palette, characters, typography. Volume matters less than consistency; a 300-asset set with tight brand coherence beats a 1,500-asset set with drift. We clean EXIF, strip metadata that might confuse training, normalise resolution.

    If the asset set has too much visual drift across examples, we don't proceed to training. Tightening the curation set by a third is a better outcome than training a LoRA that captures the average instead of the brand.

  2. 02 Training
    12–48 GPU-hr

    Usually on Modal or Replicate against a Flux .1 dev base — sometimes SDXL when the ControlNet ecosystem is mission-critical. Training runs 12 to 48 GPU-hours depending on rank and base resolution. We log every run; nothing trains without a tagged manifest of the source set.

    If the training loss curve doesn't converge within expected bounds, we adjust rank and learning rate and rerun rather than proceed to eval with a suspect checkpoint. Wasted GPU-hours beat a bad LoRA shipped to eval.

  3. 03 Eval gate
    ≥88 brand-fit

    The phase where most brand-LoRA programmes silently fail at agencies without a methodology. Fixed rubric authored with the design lead before training started — palette adherence, composition coherence, typography rendering, brand affordance ("does this feel like our brand or like a Flux output?"). 0–100 scale. We don't ship under 88; we re-train rather than tune around it.

    If brand-fidelity drifts under 85 on the weekly sample, we re-baseline the prompt library or re-train the LoRA. Confident off-brand output is worse than a missed deadline because the design team loses trust in the pipeline.

  4. 04 Deployment
    Weights on your infra

    Weights ship to your infrastructure. Pipeline wiring lands behind your auth. IP-Adapter and ControlNet stages chain in for reference conditioning and compositional control. Re-training cadence quarterly — brands evolve, model bases improve, the eval rubric itself tightens as design standards do.

    We retainer the re-training when the client wants ongoing ownership; hand it off to an internal ML team when the client wants the muscle. Either path is documented in the SOW before deployment starts.

004C / VIDEO

AI video generation services — Sora, Veo, Runway, Kling in production.

Video generation is the modality that moved from research to production fastest in 2026. The market for ai video generation services is still small, mostly because the unit economics weren't there before Sora API and Veo opened up commercial endpoints. Now they are. About a quarter of our generative ai services revenue is video as of 2026, and the curve is still going up.

005 / MODELS

Six model families. We pick per asset type and per data rule.

No house model. We've shipped on every meaningful provider in 2026, and the right call usually depends on three things: brand fidelity ceiling, unit economics at your volume, and where the data has to live. The matrix below covers what we reach for and when we don't.

Flux (Pro / Dev)

Best photoreal output in the open ecosystem in 2026. Strong prompt adherence on long detailed briefs. Commercial licensing on Flux Pro via Replicate or BFL API; Flux Dev for self-hosted research. LoRA training is mature on the .1 schnell + dev variants. Strong text-in-image rendering — a recurring failure mode for SDXL.

Photorealistic image generation where prompt nuance matters. Brand campaigns where text needs to render correctly in the image. The default starting point for new image pipelines unless the client is already invested in another stack.

Tight latency budgets under 2s — Flux is heavier than SDXL Turbo. Workloads where the existing ControlNet ecosystem is critical — SDXL still has the deeper ControlNet inventory.

About 6 in 10 image pipelines we ship in 2026 lead with Flux Pro through the BFL API or Replicate. We pair with a Flux LoRA when brand fidelity needs to hit ≥90 on our rubric.

PhotorealLoRA-readyCommercial
Stable Diffusion 3 / SDXL

Most mature open-weight image model — ControlNet, IP-Adapter, depth, pose, lineart conditioners all production-ready. Strong community LoRA ecosystem on Hugging Face. Cheapest to self-host at scale. SDXL Turbo for sub-1s latency when the output spec tolerates a quality dip.

High-volume self-hosted workloads where the unit economics matter more than raw quality ceiling. Workflows that need ControlNet, IP-Adapter, or other conditioners we don't get on hosted-only models. Brand-LoRA training where the cost of the LoRA matters.

Text-in-image rendering — SDXL is consistently the weaker model here, Flux beats it cleanly. Greenfield brand pipelines where the team isn't already invested in the SDXL ecosystem.

Self-hosted SDXL on Modal or your own H100 fleet is our default for high-volume image workloads — about 1 in 3 production builds in 2026. Always paired with IP-Adapter for reference-image conditioning.

Open-weightControlNetSelf-host
DALL-E 3 / Imagen

Hosted-only, safety-tuned, and very fast to integrate. DALL-E 3 ships through the GPT-5 vision pipeline so you get the prompt-rewriting step for free. Imagen 4 has the cleanest commercial-licence story on Google Cloud. Both have strong default safety classifiers.

Quick-start image pipelines where time-to-first-asset matters more than long-term unit economics. Clients on Google Cloud who already have Vertex billing set up. Editorial / blog-illustration workloads where prompt-only is enough.

Custom LoRAs (impossible — these are closed-weight). High-volume workloads where the hosted bill flips against self-hosted. Workloads needing ControlNet or fine-grained conditioning.

We use DALL-E 3 or Imagen as the bootstrap model in Pilot engagements when the client wants to see output in week 1. About a third graduate to Flux + LoRA in Production Build.

HostedSafety-tunedFast-start
Sora · Veo · Runway · Kling

The 2026 video-gen landscape. Sora API for cinematic 5–20s shots with strong physics. Google Veo for product video with the cleanest licensing. Runway for editing workflows — masking, motion brush, image-to-video. Kling for stylised social-content workflows where the aesthetic differs from Sora.

AI video generation services workloads — personalised explainer video, product demo at SKU scale, social-content pipelines. Always with storyboard → shot list → generation → human-review gate; never raw model-output-to-publish.

Long-form video (>30s coherent) — still a research problem in 2026. Live or near-live generation — these are all minutes-per-clip workflows. Highly brand-specific motion — fine-tunes are not generally available on the video models yet.

About a quarter of our generative ai services revenue in 2026 is video. Sora as the lead model for hero content, Runway for the editorial pipeline. Always reviewed by a human before publish — the failure modes are too costly.

SoraRunwayStoryboard-gated
ElevenLabs · Cartesia

ElevenLabs for studio-grade narration and voice cloning with commercial licensing. Cartesia for sub-150ms streaming TTS that works inside conversational apps without breaking turn-taking. Both have multilingual coverage now matching English quality on 20+ languages.

ElevenLabs for production audio where quality is the headline — audiobooks, marketing video voiceovers, brand narration. Cartesia inside any conversational app or low-latency in-product narration where ElevenLabs' latency would hurt UX.

Music generation — we route to Suno or Udio for that. Hyper-realistic clone-anybody workflows — we won't ship without consented voice opt-in regardless of what the API permits.

Cartesia inside any chatbot or voice-agent build for the latency budget. ElevenLabs in any video-or-audio generation pipeline. We've shipped both with student-voice opt-in and parental gating on ed-tech engagements.

TTSCloningLow-latency
GPT-5 Vision · Claude · Gemini

Vision-LLMs are the multimodal ai engine — image understanding paired with text reasoning. GPT-5 Vision for the strongest OCR + chart reading. Claude Sonnet 4.6 for document understanding with the best refusal posture on PII. Gemini 3.0 Pro for million-token-context multimodal — long video clips with audio understood in one call.

Multimodal ai company shape — image-or-video input, text reasoning, text-or-action output. OCR-grade extraction from product photos, invoice scans, scientific charts, screenshots. Pairs with the LLM development practice for the text-side architecture.

Image / video output — vision-LLMs are read-only. For generation we route to Flux, Sora, etc. Single-modality text workloads where the vision capability is dead weight on the bill.

Vision-LLMs sit upstream of generation in about half of our multimodal pipelines — the LLM reads the input, decides what to generate, the generator ships the output. See <a href="/services/llm-development/">our llm development services</a> for the text-side patterns.

GPT-5 VisionClaudeGemini 3
006 / STACK

Providers we've shipped on.

Pinned to what we have in production in 2026. Not the marketing list — the actual integrations under support.

  • Flux
  • SDXL
  • DALL-E 3
  • Imagen 4
  • Sora
  • Veo
  • Runway
  • Kling
  • ElevenLabs
  • Cartesia
  • Replicate
  • Modal
  • GPT-5 Vision
  • Claude
  • Gemini 3
  • Langfuse
  • Flux
  • SDXL
  • DALL-E 3
  • Imagen 4
  • Sora
  • Veo
  • Runway
  • Kling
  • ElevenLabs
  • Cartesia
  • Replicate
  • Modal
  • GPT-5 Vision
  • Claude
  • Gemini 3
  • Langfuse
007 / DECISION

Modality × deployment pattern — what we pick when.

The same decision a generative ai consultant would walk you through in week one, compressed into a grid. Use it to pre-screen the conversation before you brief us.

Workload API-firstBrand-LoRAHybridSelf-hosted
Image — editorial / blog Default Overkill Wait for scale Not yet
Image — brand campaigns Drift risk Default At >50k/mo Only if regulated
Image — product photography Weak control Default Scale step-up High-volume
Video — short-form social Sora / Runway Not viable Frame work only Research
Voice — narration ElevenLabs Voice clone only Hybrid latency Edge cases
Synthetic data — ML training Cost prohibitive Per-edge-case Common pattern Default at scale
Regulated (HIPAA / EU AI Act) Often blocked Partial fit With perimeter step Default
Solid dot — default pick. Dash — wrong tool. Plain — fits with caveats; we'd dig in on the call.
008 / BENCHMARKS

What the model choice actually costs you.

Two benchmarks every generative ai agency should be willing to put on a service page. Brand-fit scores from our internal rubric on a representative 50-example brand evaluation. Per-100-image costs at the volume tiers we've shipped at. Numbers shift quarterly as providers reprice; the ranking is more stable than the absolute values.

Brand-fit ceiling — image models (0–100, internal rubric)
Flux Pro
94
Photoreal + text-in-image
Stable Diffusion 3
86
Open-weight, ControlNet
Imagen 4
89
Cleanest commercial licence
DALL-E 3
81
Fast integration via GPT-5
SDXL base
74
Cheapest self-host
Cost per 100 images — current provider pricing
DALL-E 3 (hosted)
18¢
$/100 images
Flux Pro (Replicate)
12¢
$/100 images
Imagen 4 (Vertex)
14¢
$/100 images
SDXL (self-host, idle)
28¢
Effective $/100 at low volume
SDXL (self-host, 500k/mo)
Amortised at high volume

Self-hosted SDXL flips against hosted at roughly 50,000 assets / month for image. Below that, the GPU idle time eats the savings. Above it, the marginal cost approaches zero and the line steepens.

008B / ENTERPRISE

Enterprise generative ai — what changes versus an SMB build.

A generative ai agency that ships SMB pilots and an enterprise generative ai practice are not the same shape. The model picks overlap. The pipeline architecture overlaps. What changes is everything around it — procurement, security review, data residency, audit trail, the compliance posture, and the cadence at which the design or content lead can grade an eval set. Most enterprise generative ai engagements we run involve 6 to 10 internal stakeholders, not 2; we plan for that at week one.

Enterprise tier ships with more documentation — pipeline diagrams, eval rubrics, runbooks, IP posture memos, security control mappings — because procurement and audit teams need the paperwork. The strategy memo includes a procurement-ready vendor map, a capex-vs-opex budget breakdown, a compliance assessment that names the controls, and a roadmap that survives the EU AI Act enforcement-schedule clarification expected in 2027. Delivered for fintech, healthcare, and a publishing-rights-heavy media client; each took the memo into a board pack.

009 / PROCESS

Eval-first, brand-controlled, eight weeks to a publishable pipeline.

The eval set with the brand rubric lands in week 2, graded with the design or content lead. No generation pipeline ships without one — and the eval set keeps grading after launch, not just before. That's the difference between a generative ai company that talks about quality and one that measures it.

WEEK 1

Discovery

Output spec (what assets, how many, brand rules, safety posture). Compliance posture — IP, EU AI Act, internal policy. We define the failure modes before the model is picked.

WEEK 2

Eval set

Graded examples — brand-fit, factual accuracy, format compliance, technical quality. Built with the design or content lead, not invented by us. Drives the model selection.

WEEK 2–4

Baseline

Multiple models benchmarked against the eval (Flux vs SDXL vs Imagen for image; Sora vs Runway for video; ElevenLabs vs Cartesia for audio). Cost, latency, quality, safety all measured.

WEEK 4–8

Brand controls

LoRA training where prompt-only fails brand QA. IP-Adapter and ControlNet for compositional control. Constrained prompt library for the editorial surface. Human-approval gates on customer-facing publish.

WEEK 8+

Deploy

Auth, rate limiting, Langfuse instrumentation, cost ledger, C2PA provenance, watermarking, NSFW classifier. Rollback playbook in the runbook. SOC 2 alignment when the workload touches it.

ONGOING

Running

Weekly eval review, brand-drift monitoring, monthly cost audit, quarterly LoRA re-training. Ownership transfers to the client's design / content / engineering team.

010 / EVAL

Four gates. Every pipeline. Every week.

  1. 01 Brand-fidelity score
    ≥90

    Human-graded on a 0–100 rubric per asset type, sampled weekly post-launch. Rubric is built with the design lead — colour palette, composition, typography, brand affordance. LLM-as-judge for sub-scores, but final scores are human-confirmed on a 10% sample. Drift catches LoRA degradation early.

    If brand-fidelity drifts under 85 on the weekly sample, we re-baseline the prompt library or re-train the LoRA. Confident off-brand output is worse than a missed deadline because the design team loses trust in the pipeline.

  2. 02 Safety failure rate
    <1%

    NSFW classifier on every output, brand-fit scorer, optional human-review gate. Tracked as failures-reaching-publish per 10,000 assets per week. We red-team the pipeline before launch with prompt-injection probes and adversarial inputs — the eval set includes the failure modes, not just the happy path.

    Any safety incident reaching a customer surface triggers a rollback to the human-gated pipeline within 24 hours and a root-cause review. We default to human-approval on high-stakes surfaces — it's cheaper than recovering from one published failure.

  3. 03 Median cost per asset
    Modelled at discovery

    Per-asset cost — model API spend, GPU minutes, watermarking, classifier passes — tracked weekly in Langfuse. Modelled during the Pilot using the expected output mix and volume. We don't quote averages from marketing decks; we model from the actual eval scenarios.

    If median asset cost drifts more than 25% above the baseline for two weeks, we audit the model routing (a quarter of overruns) and the cache hit rate (most of the remaining three quarters). Surprise bills aren't a surprise because the modelling is in week 2.

  4. 04 P95 prompt-to-publish latency
    Under per-surface SLA

    Trigger to publishable asset, including queue waits and human-gate dwell where applicable. SLA varies by surface — sub-4s for in-app image, sub-30s for hero campaigns, minutes for cinematic video. We track p50 / p95 / p99 separately because the tails matter for queue sizing.

    P95 SLA breach for 72h triggers a routing review on the heaviest model nodes. Usually the fix is moving routine generations to a faster model (Flux schnell or SDXL Turbo) — not replatforming the pipeline.

011 / COMPLIANCE

Provenance, watermarking, and the EU AI Act.

Generative AI inherits the IP risk, the data-protection rules, and now the EU AI Act high-risk obligations. Enterprise generative ai engagements either build the compliance posture in, or they get torn out the first time legal reviews the pipeline. We build it in. C2PA at generation, watermarking at publish, audit trail per asset, BAA-ready where healthcare needs it.

Audited annually · Continuous monitoring
  • C2PA provenance
    Content credentials embedded at generation
    AUDITED · 2026
  • Watermarking
    SynthID + invisible watermark options
    AUDITED · 2026
  • EU AI Act
    High-risk system obligations · phased 2025–2027
    AUDITED · 2026
  • SOC 2 Type II
    Audited annually · continuous monitoring
    AUDITED · 2026
  • ISO 27001:2022
    Current revision · annual surveillance
    AUDITED · 2026
  • HIPAA alignment
    BAA available for healthcare engagements
    READY
  • Commercial IP
    Flux Pro · Imagen · Firefly · ElevenLabs licensed
    AUDITED · 2026
  • Audit trail
    Per-asset provenance from prompt to publish
    AUDITED · 2026
012 / USE CASES

Where teams have shipped.

Three anonymized engagements. Modality, segment, and outcome are real; brand removed under NDA.

Marketing
DTC retail · catalogue at scale

Brand-LoRA image generation pipeline

Typical shape: custom SDXL LoRA trained on 200–1,000 brand-approved hero images. Pipeline takes product SKU and scene brief, generates variants on Flux Pro with the brand-LoRA, design lead approves before publish. C2PA provenance embedded at generation. Augments hero-variant work the in-house photo studio can't economically produce by hand.

Deliverable: trained LoRA + production pipeline + design-lead eval rubric
Education
Ed-tech · regulated learner audience

AI-narrated lessons with consent-gated voice

Typical shape: self-paced lessons narrated in a chosen voice with parental opt-in and age-gating. Cartesia TTS for sub-150ms in-app streaming. SSML drives pacing per learning profile. ElevenLabs for the marketing-side teacher previews. Audit log on every narration to satisfy the safeguarding policy.

Deliverable: narration pipeline + consent ledger + safeguarding runbook
Product
B2B SaaS · release-ops shape

Release-notes automation with brand voice

Typical shape: PR descriptions go in, LLM drafts release notes scored against the team's voice rubric, human edits, publish. Tone consistency measured weekly via a 40-example eval set. Pairs with our llm development services on the text generation side. Not consumer-facing, but the principle is the same — eval-graded before publish.

Deliverable: drafting pipeline + voice rubric + weekly eval dashboard
012B / WHY PAITEQ

Why teams pick us as their generative ai development services partner.

013 / TIMELINE

What the eight-week Production Build looks like.

The standard generative ai development services build — a defined slice ships in eight weeks. Brand-LoRA Training adds 4–6 weeks; Advisory runs in parallel at the front. Pilot is a tighter 2–4 week cut of the same shape.

6 phases
WEEK 1 Discovery

Output spec, brand rules, safety posture, compliance map

Spec sign-off

WEEK 2 Eval set

Graded examples across modalities; rubric authored with design lead

Design-lead grading complete

WEEK 3–4 Baseline

Multi-model benchmark; cost / quality / latency / safety scored

Model picked + costed

WEEK 4–6 Brand controls

LoRA trained (if needed), prompt library locked, human-gate UI wired

Brand-fit ≥ 88 on eval

WEEK 6–8 Pipeline build

Auth, rate limits, provenance, watermark, classifier, runbook

Dry-run scenarios green

WEEK 8+ Launch

Live publish, drift monitoring, weekly eval review

First 30 days of clean traces

014 / ENGAGE

Four ways to start.

01 Generation Pilot Fixed scope
2–4 weeks

Pilot one modality, one use case.

In scope
  • One modality, one use case
  • Eval set with brand + quality rubric
  • Working prototype on real data
  • Demo + cost / quality memo
Out of scope
  • Production deploy
  • Brand-LoRA training
  • Compliance posture (separate Advisory)
02 Production Build Fixed scope
8–14 weeks

Full pipeline, brand controls, observability.

In scope
  • All Pilot deliverables
  • Brand controls (prompt library / LoRA)
  • Human-review gates on publish
  • Provenance + safety + audit trail
  • Four weeks of post-launch iteration
03 Brand-LoRA Training Fixed scope
4–6 weeks

Custom style LoRA on your assets.

In scope
  • Asset curation and cleaning
  • LoRA training on Flux or SDXL base
  • Eval against design-lead rubric
  • Weights deployed to your infrastructure
04 Generative AI Consulting Fixed scope
2–3 weeks

Model audit, build-vs-API, roadmap.

In scope
  • Model selection audit
  • Build-vs-API decision framework
  • Compliance and IP review
  • Costed roadmap memo
015 / FAQ

What buyers ask before signing.

What's the difference between a generative ai development services engagement and an LLM build?

LLMs are text models. Generative AI in our taxonomy means non-text generation — image, video, audio, 3D, multimodal — with text generation handled by our llm development services practice. The distinction matters because the engineering shape is different. Image and video pipelines have eval rubrics built around visual quality and brand fidelity, GPU economics that flip differently at scale, IP / copyright risk that pure text doesn't carry, and a human-review gate that's almost always mandatory on customer-facing surfaces.

About a third of our engagements end up multimodal — vision-LLMs read the input, a generation model produces the output. In those, the two practices collaborate. The pillar you land on first depends on whether the dominant value is in the read step or the generate step.

Can you train a brand-controlled model on our assets, and where do the weights live?

Yes — Brand-LoRA Training is one of our four engagement shapes, 4–6 weeks. We start with asset curation and cleaning (usually 200–1,000 brand-approved examples; more if the brand is broad). Then the LoRA trains on Modal or Replicate against a Flux or SDXL base. The output is evaluated against the design lead before deployment — brand-fit score must clear 88 on our internal rubric.

Weights deploy to your infrastructure, not ours. You own the artefact. Re-training cadence is usually quarterly as the brand evolves; we can either retainer that or hand it off to your team. We will not train on assets you don't have clean licensing for, and we document the IP posture in the SOW.

How do you handle IP, copyright, and provenance for generated content?

Three layers. First, model choice — we default to commercially-licensed models (Flux Pro, Imagen 4, Adobe Firefly where the use-case fits, ElevenLabs and Cartesia for audio). Second, provenance — C2PA content credentials embedded at generation, with optional invisible watermarking via SynthID or a partner. Third, audit trail — every asset has a record from prompt to model to publish surface, kept for the retention period in the SOW.

For regulated workloads (EU AI Act high-risk classifications, HIPAA-aligned use-cases), we configure the pipeline to satisfy the obligations and document the assessment in the engagement deliverables. Compliance is a shape we ship, not a checkbox we tick at the end.

Hosted (Replicate / OpenAI / Anthropic) or self-hosted on our cloud?

Hosted to start, almost always. The four engagement patterns answer this — API-first (60% of new pilots), Brand-LoRA, Hybrid, Fully-Self-Hosted. We move to hybrid or self-hosted when volume makes the math flip, regulated data forces it, or you need customisation (specific LoRAs, ControlNet conditioning) the hosted providers don't expose.

The break-even is workload-specific. For image generation, hosted typically beats self-hosted under 50,000 assets per month; above that, the unit economics favour self-hosted SDXL or Flux Dev on a small H100 fleet. For video, hosted dominates in 2026 because the open-weight video models don't yet match Sora or Veo on quality. For voice, Cartesia and ElevenLabs hosted is almost always the right answer — the latency and quality lead is too large.

How do you measure generation quality at scale, and what does failure look like?

Four metrics, one stepper. Brand-fidelity score (human-graded on a 0–100 rubric, target ≥90, sampled weekly). Safety failure rate (NSFW or off-brand outputs reaching publish, target <1%). Median cost per asset (modelled at discovery, drift threshold 25% / two weeks). P95 prompt-to-publish latency (per-surface SLA). All four live in Langfuse alongside the LLM observability for any vision-LLM upstream.

Failure is concrete. Brand drift means re-baseline the prompt library or re-train the LoRA. Safety incident means rollback to the human-gated pipeline within 24 hours plus a root-cause review. Cost runaway is almost always a model-routing or cache issue, not a volume issue. Latency breach almost always solves with a routing change to a faster model on the easy generations, not a replatform.

017 / Start a project

Ship brand-grade generation in 8 weeks.

Pilot in 2–4. Production Build in 8–14. Brand-LoRA in 4–6. Generative ai consulting in 2–3.