# Generative AI Development Services — Paiteq

> Paiteq is a generative AI development services and generative AI consulting agency — brand-controlled image, video, audio, and multimodal pipelines for enterprise.

**HTML version:** https://www.paiteq.com/services/generative-ai/

## Key facts

- Modalities: image, video, audio, multimodal.
- Posture: brand-controlled outputs, eval-graded quality, enterprise guardrails.

## Related pages

- [LLM Development](https://www.paiteq.com/services/llm-development/)
- [Machine Learning Development](https://www.paiteq.com/services/machine-learning-development/)
- [Services hub](https://www.paiteq.com/services/)

## About Paiteq

Enterprise AI engineering — production agents, RAG, LLM apps, automation, generative AI. Eval-first, senior-led, fixed-scope engagements. Same-day reply from engineering. NDA counter-signed before discovery. Walk-away clause on every engagement.

**Site index for agents:** https://www.paiteq.com/llms.txt
**Full content for agents:** https://www.paiteq.com/llms-full.txt
**Book a call:** https://www.paiteq.com/contact/

---

## Full content

Generative AI Development Services

# *Generative AI development services*, brand-controlled image, video, audio at scale.

Paiteq is a generative ai development services and generative ai consulting agency. We ship production pipelines on Flux, SDXL, Sora, Runway, ElevenLabs and Cartesia, brand-controlled with LoRA training, eval-graded, with provenance and human-review gates built in. Not a vendor of a single house model; we pick the stack against your output spec, your data rules, and your unit economics.

[Talk to engineering](/contact/) [See engagement shapes](#engage)

Modalities Image · Video · Audio · Multimodal

Stack Flux · SDXL · Sora · Runway · ElevenLabs

Engage Pilot · Build · Brand-LoRA · Advisory

Provenance C2PA + watermark

001 / OUTCOMES

## Four numbers that travel with every generation pipeline.

The eval rubric is what separates a generative ai company from a prompt-engineering hobbyist. We grade every pipeline on brand-fidelity, safety, cost, and latency, measured weekly, published to the client in a shared dashboard. These four numbers determine whether a pipeline ships and whether it stays live.

0 %

Brand-fidelity score

Human-graded on the brand rubric weekly post-launch.

< 0 %

Safety failures

NSFW or off-brand outputs reaching publish. Hard gate.

0 %

Asset cost cut

Median per-asset cost vs. agency / stock-photo baseline.

< 0 s

P95 image latency

Prompt to rendered file. Video and audio carry their own SLAs.

Numbers above are medians across shipped engagements on Flux, Stable Diffusion 3, Sora, Runway, ElevenLabs, and Cartesia for clients in e-commerce, ed-tech, fintech, and B2B SaaS. Spread by surface type is wide, we model your specific workload at discovery rather than quote an average.

-   01
    
    ### The design lead is the eval judge
    
    Not the engineering lead, not us, and definitely not the model. Brand-fit is a judgment call, the person who owns the brand has to own the grading rubric. Pipelines fail at three months when this is reversed.
    
-   02
    
    ### The cost curve isn't the marketing slide
    
    Hosted API pricing is fine until you cross the volume cliff where self-hosted SDXL or Flux Dev becomes cheaper. Most pipelines we audit are sitting on the wrong side of that cliff. Benchmarks below carry the real break-even math.
    

002 / CAPABILITIES

## Six generation capabilities across six modalities.

A capability-by-modality heatgrid showing where Paiteq's generative ai services have shipped at production scale. Strength scores reflect what we've taken to production, not what we've experimented with, the gaps are honest.

Function Industry

Image

Video

Audio

3D / depth

Vision-LLM

Brand LoRA

Marketing / brand assets

Product photography

Personalised video

Voice / narration

Synthetic training data

Multimodal apps

Marketing / brand assets

ImageVideoAudioVision-LLMBrand LoRA 3D / depth

Product photography

Image3D / depthVision-LLMBrand LoRA VideoAudio

Personalised video

ImageVideoAudioBrand LoRA 3D / depthVision-LLM

Voice / narration

AudioBrand LoRA ImageVideo3D / depthVision-LLM

Synthetic training data

ImageVideoAudio3D / depthVision-LLM Brand LoRA

Multimodal apps

ImageVideoAudio3D / depthVision-LLM Brand LoRA

Possible fit Good fit Primary vertical

Dark cells: shipped at production scale. Medium: shipped in pilot. Light: experimented but not yet production. The empty cells are real, we don't claim depth we haven't built.

-   Image · Brand~50% of engagements
    
    Hero imagery, campaign variants, social-first cuts of approved hero shots, and the long tail of brand-consistent variations a design team can't economically produce by hand. The bar is brand fidelity, not photorealism, the failure mode is "off-brand" not "obviously AI". Brand-controlled generation via a custom LoRA is the differentiator against a prompt-only pipeline.
    
-   Product photographyHigher-difficulty cousin
    
    Same stack, SDXL or Flux with IP-Adapter and ControlNet, but the brand bar is steeper because a customer is looking at a specific SKU. The methodology we run here can replace a meaningful share of an in-house photo-studio's output, with tactile work (lifestyle, regulated categories) still routed to a human photographer, the goal is augmentation against catalogue volume, not full replacement.
    
-   VideoResearch → production in 2026
    
    AI video generation services is now a real category, Sora API, Google Veo, Runway, Kling. We've shipped explainer-video pipelines that take a script and produce hero plus scene cuts with brand-consistent characters. Cinematic video at SKU scale drives most of our video work. Live and long-form over 30s are still research; we don't ship those yet.
    
-   Voice & audioMost operationally mature
    
    ElevenLabs for studio-grade narration; Cartesia for conversational sub-150ms streaming. Voice cloning is gated on consent, we won't ship a clone without a documented opt-in chain, regardless of what the API permits. Synthetic data generation for ML training is a niche but high-leverage cousin, covering edge-case synthesis for vision and OCR pipelines where labelled data is scarce.
    
-   Multimodal~30% of pipelines
    
    A multimodal ai company in 2026 means running vision-LLMs (GPT-5 Vision, Claude Sonnet 4.6, Gemini 3.0 Pro) on the input side and generation models on the output side, in one pipeline. Image-in, decision via Claude, asset-out via Flux, this is where the practice meets our [llm development services](/services/llm-development/) sibling.
    

003 / SERVICES

## Four engagement shapes, fixed-scope.

Pilot, full Build, Brand-LoRA training, or Advisory. Every generative ai development services engagement maps to one of the four. Mixed engagements are billed as two consecutive shapes, not as a single open-ended retainer.

[

01 / PILOT ↗

Generation Pilot

One modality, one use case, eval-graded, demoed in 2–4 weeks. The way most clients start a generative ai development services engagement before committing to a full build.

2–4 wks

](#engage)[

02 / BUILD ↗

Production Build

Full pipeline with brand controls, human-review gates, observability, provenance. The bulk of our generative ai agency revenue. Includes four weeks of post-launch iteration.

8–14 wks

](#engage)[

03 / BRAND-LORA ↗

Brand-LoRA Training

Custom style LoRAs trained on your brand assets. Eval-validated against the design lead before deployment. Weights stay on your infrastructure.

4–6 wks

](#engage)[

04 / ADVISORY ↗

Generative AI Consulting

Model-selection audit, build-vs-API decision, compliance review, roadmap. Our generative ai consulting shape, outcome is a costed plan, not a prototype.

2–3 wks

](#engage)

003B / ADVISORY

## Generative AI consulting, when the question is build-vs-buy, not how-to-build.

A separate shape from the Build engagements. Generative ai consulting is a 2–3 week intervention where the deliverable is a costed decision memo, not a prototype. We field about a third of our inbound for this, most often from a CTO or CMO who has an internal team running prompt-only pipelines and needs an outside read before committing to a Flux self-hosting decision or a Brand-LoRA programme.

01

### Model selection

Flux vs Stable Diffusion 3 vs Imagen 4 for image; Sora vs Veo vs Runway for video; ElevenLabs vs Cartesia for audio. We bring our internal benchmark numbers and your eval scenarios into one room and pick.

02

### Build-vs-API economics

When does the GPU bill on self-hosted SDXL beat the per-asset bill on Replicate Flux Pro at your projected volume? Answer depends on idle time, batch size, and cache hit rate, most internal teams don't have the data to model it correctly.

03

### Compliance & IP review

C2PA provenance, watermarking, EU AI Act high-risk classification (phasing in through 2025–2027), HIPAA-aligned data handling where healthcare data is in the loop, and a clean read on what the model providers actually allow under commercial terms.

04

### Generative AI strategy memo

A costed roadmap that maps generation surfaces against your business model with priority order, budget, and timeline. This is the deliverable a board pack is built from.

Not a prototype, not a workshop, not a discovery sprint that produces nothing. Three weeks in you have the memo, the numbers, the model picks, the compliance assessment, the costed plan. If you build it, we roll the consulting fee into the Production Build. If your team builds it with our patterns, you take the memo and run.

004 / PATTERNS

## Four deployment patterns. We pick on cost, control, and compliance.

The decision between hosted-API, brand-LoRA, hybrid, and fully-self-hosted isn't religious, it's economics plus regulation plus brand control. About 60% of new engagements start as API-first; a quarter graduate to Brand-LoRA in Production Build; a tenth end up Hybrid or Self-hosted because volume or compliance forced it.

   

01

### API-FIRST

The fastest path to a working generation pipeline. Calls Flux Pro, Imagen, or DALL-E behind a thin service. Brand consistency comes from a constrained prompt library and reference images, not a custom model. A human-approval queue gates publish on customer-facing surfaces. Where we start about 60% of new engagements.

Pick when

-   Setup measured in days, not weeks
-   Editorial / blog illustration where brand drift is recoverable
-   Pilot phase to validate output spec before committing to a LoRA
-   Workloads under 10,000 assets/month where hosted economics win

Skip when

-   Regulated data, outputs leave your perimeter
-   Product photography where prompt-only fails brand QA
-   High-volume workloads where the per-asset bill flips against self-hosted
-   Markets needing tight ControlNet conditioning

Stack

Flux ProDALL-E 3Imagen

02

### BRAND-LORA

When prompt-only generation drifts off-brand, we train a style LoRA on 200–1,000 brand-approved assets. The LoRA captures the visual identity that a prompt can't describe. Paired with IP-Adapter for reference-image conditioning and ControlNet for compositional control. We've shipped this for e-commerce hero generation where the previous prompt-only pipeline failed brand QA at a 40% rate.

Pick when

-   Brand-fit scores reliably 88–94 on our rubric versus 60–75 for prompt-only
-   Weights deploy to your infrastructure
-   you own the artefact
-   Product photography or hero campaigns at SKU scale
-   Re-training cadence usually quarterly as brand evolves

Skip when

-   One-off editorial illustration, overkill
-   Brand assets under 200 examples, too thin to train against
-   Greenfield brand without locked visual identity
-   Workloads where prompt-only already clears brand QA

Stack

Flux LoRASDXLIP-AdapterControlNet

03

### HYBRID

A pragmatic middle path. Hosted models (Flux Pro, Imagen, Sora) handle the heavy generation. A self-hosted model on your GPUs does the brand-finishing step, applying the LoRA, watermarking, safety classifier, and any IP-sensitive transformations. Cuts the provider bill on high-volume workloads while keeping the regulated steps inside your perimeter.

Pick when

-   Break-even versus pure hosted lands around 50,000 assets / month for image
-   Modal or Replicate as the GPU substrate for the in-perimeter step
-   Mixed regulated / non-regulated content in the same pipeline
-   E-commerce, ed-tech, and publishers at scale

Skip when

-   Very low volume, the platform tax doesn't pay back
-   Pure-regulated workloads, go straight to fully-self-hosted
-   Workloads where the hosted models don't expose the LoRA hook we need

Stack

ModalReplicateFlux ProSDXL

04

### SELF-HOSTED

For regulated data or sustained six-figure-monthly hosted bills, we move the full pipeline onto your infrastructure. Stable Diffusion 3 or Flux base models, custom LoRAs, watermarking, NSFW classifier, all running on A100s or H100s on your cloud. Slower to stand up but the unit economics flip past a threshold and the data never leaves.

Pick when

-   HIPAA-aligned, defence, or EU AI Act high-risk workloads
-   Sustained six-figure-monthly hosted bills make the math flip
-   Need fine-grained ControlNet or custom LoRA hooks the hosted providers don't expose
-   Data residency requires inside-perimeter execution

Skip when

-   Low or spiky volume, GPU idle time eats the savings
-   Greenfield pilots where time-to-first-asset matters
-   Teams without MLOps capacity for the platform-engineering side
-   Workloads where commercial-license hosted models are easier IP-wise

Stack

SD3Flux DevH100vLLM

004B / BRAND

## Brand-controlled generation, the LoRA approach we ship for enterprise.

Generic image models produce generic outputs. That works for editorial illustration. It fails fast on product photography, hero campaigns, or any surface where a customer can tell that the brand is shaped by the model and not the other way around. Brand-controlled generation is how we close that gap, and it's a genuine technical gap in the SERP. None of the top-ranking generative ai agency pages cover the methodology at this depth, because most haven't shipped enough LoRAs to have one.

1.  01 Asset curation
    
    200–1,000
    
    Sit with the design lead, pull 200–1,000 brand-approved examples that represent the visual identity, composition, lighting, palette, characters, typography. Volume matters less than consistency; a 300-asset set with tight brand coherence beats a 1,500-asset set with drift. We clean EXIF, strip metadata that might confuse training, normalise resolution.
    
    If the asset set has too much visual drift across examples, we don't proceed to training. Tightening the curation set by a third is a better outcome than training a LoRA that captures the average instead of the brand.
    
2.  02 Training
    
    12–48 GPU-hr
    
    Usually on Modal or Replicate against a Flux .1 dev base, sometimes SDXL when the ControlNet ecosystem is mission-critical. Training runs 12 to 48 GPU-hours depending on rank and base resolution. We log every run; nothing trains without a tagged manifest of the source set.
    
    If the training loss curve doesn't converge within expected bounds, we adjust rank and learning rate and rerun rather than proceed to eval with a suspect checkpoint. Wasted GPU-hours beat a bad LoRA shipped to eval.
    
3.  03 Eval gate
    
    ≥88 brand-fit
    
    The phase where most brand-LoRA programmes silently fail at agencies without a methodology. Fixed rubric authored with the design lead before training started, palette adherence, composition coherence, typography rendering, brand affordance ("does this feel like our brand or like a Flux output?"). 0–100 scale. We don't ship under 88; we re-train rather than tune around it.
    
    If brand-fidelity drifts under 85 on the weekly sample, we re-baseline the prompt library or re-train the LoRA. Confident off-brand output is worse than a missed deadline because the design team loses trust in the pipeline.
    
4.  04 Deployment
    
    Weights on your infra
    
    Weights ship to your infrastructure. Pipeline wiring lands behind your auth. IP-Adapter and ControlNet stages chain in for reference conditioning and compositional control. Re-training cadence quarterly, brands evolve, model bases improve, the eval rubric itself tightens as design standards do.
    
    We retainer the re-training when the client wants ongoing ownership; hand it off to an internal ML team when the client wants the muscle. Either path is documented in the SOW before deployment starts.
    

004C / VIDEO

## AI video generation services, Sora, Veo, Runway, Kling in production.

Video generation is the modality that moved from research to production fastest in 2026. The market for ai video generation services is still small, mostly because the unit economics weren't there before Sora API and Veo opened up commercial endpoints. Now they are. About a quarter of our generative ai services revenue is video as of 2026, and the curve is still going up.

-   A
    
    ### Pipeline shape
    
    Storyboard first, break the script into shots, lock visual continuity (characters, palette, lighting), decide which shots are AI vs traditional. Shot-by-shot generation through the picked model (Sora API for hero cinematic; Veo for cleanest commercial licensing; Runway for editorial with masking + motion brush; Kling when the aesthetic differs). Per-shot eval against storyboard frame before assembly. **Human review on assembled cut before publish, always.**
    
-   B
    
    ### What we ship, and don't
    
    **Ship:** personalised explainer video (one script, many SKU or audience variants), product video at SKU scale (catalogue-driven), social-content pipelines (short-form variants of a hero piece).
    
    **Don't ship in 2026:** long-form coherent video over 30s, anything live or near-live, hyper-personalised content involving real people without a clean consent chain. Research problems or ethical hazards, check back in six months.
    

005 / MODELS

## Six model families. We pick per asset type and per data rule.

No house model. We've shipped on every meaningful provider in 2026, and the right call usually depends on three things: brand fidelity ceiling, unit economics at your volume, and where the data has to live. The matrix below covers what we reach for and when we don't.

Flux (Pro / Dev)

Strengths

Best photoreal output in the open ecosystem in 2026. Strong prompt adherence on long detailed briefs. Commercial licensing on Flux Pro via Replicate or BFL API; Flux Dev for self-hosted research. LoRA training is mature on the .1 schnell + dev variants. Strong text-in-image rendering, a recurring failure mode for SDXL.

When We Pick

Photorealistic image generation where prompt nuance matters. Brand campaigns where text needs to render correctly in the image. The default starting point for new image pipelines unless the client is already invested in another stack.

When We Don't

Tight latency budgets under 2s, Flux is heavier than SDXL Turbo. Workloads where the existing ControlNet ecosystem is critical, SDXL still has the deeper ControlNet inventory.

Paiteq Pattern

About 6 in 10 image pipelines we ship in 2026 lead with Flux Pro through the BFL API or Replicate. We pair with a Flux LoRA when brand fidelity needs to hit ≥90 on our rubric.

PhotorealLoRA-readyCommercial

Stable Diffusion 3 / SDXL

Strengths

Most mature open-weight image model, ControlNet, IP-Adapter, depth, pose, lineart conditioners all production-ready. Strong community LoRA ecosystem on Hugging Face. Cheapest to self-host at scale. SDXL Turbo for sub-1s latency when the output spec tolerates a quality dip.

When We Pick

High-volume self-hosted workloads where the unit economics matter more than raw quality ceiling. Workflows that need ControlNet, IP-Adapter, or other conditioners we don't get on hosted-only models. Brand-LoRA training where the cost of the LoRA matters.

When We Don't

Text-in-image rendering, SDXL is consistently the weaker model here, Flux beats it cleanly. Greenfield brand pipelines where the team isn't already invested in the SDXL ecosystem.

Paiteq Pattern

Self-hosted SDXL on Modal or your own H100 fleet is our default for high-volume image workloads, about 1 in 3 production builds in 2026. Always paired with IP-Adapter for reference-image conditioning.

Open-weightControlNetSelf-host

DALL-E 3 / Imagen

Strengths

Hosted-only, safety-tuned, and very fast to integrate. DALL-E 3 ships through the GPT-5 vision pipeline so you get the prompt-rewriting step for free. Imagen 4 has the cleanest commercial-licence story on Google Cloud. Both have strong default safety classifiers.

When We Pick

Quick-start image pipelines where time-to-first-asset matters more than long-term unit economics. Clients on Google Cloud who already have Vertex billing set up. Editorial / blog-illustration workloads where prompt-only is enough.

When We Don't

Custom LoRAs (impossible, these are closed-weight). High-volume workloads where the hosted bill flips against self-hosted. Workloads needing ControlNet or fine-grained conditioning.

Paiteq Pattern

We use DALL-E 3 or Imagen as the bootstrap model in Pilot engagements when the client wants to see output in week 1. About a third graduate to Flux + LoRA in Production Build.

HostedSafety-tunedFast-start

Sora · Veo · Runway · Kling

Strengths

The 2026 video-gen landscape. Sora API for cinematic 5–20s shots with strong physics. Google Veo for product video with the cleanest licensing. Runway for editing workflows, masking, motion brush, image-to-video. Kling for stylised social-content workflows where the aesthetic differs from Sora.

When We Pick

AI video generation services workloads, personalised explainer video, product demo at SKU scale, social-content pipelines. Always with storyboard → shot list → generation → human-review gate; never raw model-output-to-publish.

When We Don't

Long-form video (>30s coherent), still a research problem in 2026. Live or near-live generation, these are all minutes-per-clip workflows. Highly brand-specific motion, fine-tunes are not generally available on the video models yet.

Paiteq Pattern

About a quarter of our generative ai services revenue in 2026 is video. Sora as the lead model for hero content, Runway for the editorial pipeline. Always reviewed by a human before publish, the failure modes are too costly.

SoraRunwayStoryboard-gated

ElevenLabs · Cartesia

Strengths

ElevenLabs for studio-grade narration and voice cloning with commercial licensing. Cartesia for sub-150ms streaming TTS that works inside conversational apps without breaking turn-taking. Both have multilingual coverage now matching English quality on 20+ languages.

When We Pick

ElevenLabs for production audio where quality is the headline, audiobooks, marketing video voiceovers, brand narration. Cartesia inside any conversational app or low-latency in-product narration where ElevenLabs' latency would hurt UX.

When We Don't

Music generation, we route to Suno or Udio for that. Hyper-realistic clone-anybody workflows, we won't ship without consented voice opt-in regardless of what the API permits.

Paiteq Pattern

Cartesia inside any chatbot or voice-agent build for the latency budget. ElevenLabs in any video-or-audio generation pipeline. We've shipped both with student-voice opt-in and parental gating on ed-tech engagements.

TTSCloningLow-latency

GPT-5 Vision · Claude · Gemini

Strengths

Vision-LLMs are the multimodal ai engine, image understanding paired with text reasoning. GPT-5 Vision for the strongest OCR + chart reading. Claude Sonnet 4.6 for document understanding with the best refusal posture on PII. Gemini 3.0 Pro for million-token-context multimodal, long video clips with audio understood in one call.

When We Pick

Multimodal ai company shape, image-or-video input, text reasoning, text-or-action output. OCR-grade extraction from product photos, invoice scans, scientific charts, screenshots. Pairs with the LLM development practice for the text-side architecture.

When We Don't

Image / video output, vision-LLMs are read-only. For generation we route to Flux, Sora, etc. Single-modality text workloads where the vision capability is dead weight on the bill.

Paiteq Pattern

Vision-LLMs sit upstream of generation in about half of our multimodal pipelines, the LLM reads the input, decides what to generate, the generator ships the output. See [our llm development services](/services/llm-development/) for the text-side patterns.

GPT-5 VisionClaudeGemini 3

006 / STACK

## Providers we've shipped on.

Pinned to what we have in production in 2026. Not the marketing list, the actual integrations under support.

-   Flux
-   SDXL
-   DALL-E 3
-   Imagen 4
-   Sora
-   Veo
-   Runway
-   Kling
-   ElevenLabs
-   Cartesia
-   Replicate
-   Modal
-   GPT-5 Vision
-   Claude
-   Gemini 3
-   Langfuse
-   Flux
-   SDXL
-   DALL-E 3
-   Imagen 4
-   Sora
-   Veo
-   Runway
-   Kling
-   ElevenLabs
-   Cartesia
-   Replicate
-   Modal
-   GPT-5 Vision
-   Claude
-   Gemini 3
-   Langfuse

007 / DECISION

## Modality × deployment pattern, what we pick when.

The same decision a generative ai consultant would walk you through in week one, compressed into a grid. Use it to pre-screen the conversation before you brief us.

Workload

API-first

Brand-LoRA

Hybrid

Self-hosted

Image, editorial / blog

Default

Overkill

Wait for scale

Not yet

Image, brand campaigns

Drift risk

Default

At >50k/mo

Only if regulated

Image, product photography

Weak control

Default

Scale step-up

High-volume

Video, short-form social

Sora / Runway

Not viable

Frame work only

Research

Voice, narration

ElevenLabs

Voice clone only

Hybrid latency

Edge cases

Synthetic data, ML training

Cost prohibitive

Per-edge-case

Common pattern

Default at scale

Regulated (HIPAA / EU AI Act)

Often blocked

Partial fit

With perimeter step

Default

Solid dot, default pick. Dash, wrong tool. Plain, fits with caveats; we'd dig in on the call.

008 / BENCHMARKS

## What the model choice actually costs you.

Two benchmarks every generative ai agency should be willing to put on a service page. Brand-fit scores from our internal rubric on a representative 50-example brand evaluation. Per-100-image costs at the volume tiers we've shipped at. Numbers shift quarterly as providers reprice; the ranking is more stable than the absolute values.

Brand-fit ceiling, image models (0–100, internal rubric)

Flux Pro

94

Photoreal + text-in-image

Stable Diffusion 3

86

Open-weight, ControlNet

Imagen 4

89

Cleanest commercial licence

DALL-E 3

81

Fast integration via GPT-5

SDXL base

74

Cheapest self-host

Cost per 100 images, current provider pricing

DALL-E 3 (hosted)

18¢

$/100 images

Flux Pro (Replicate)

12¢

$/100 images

Imagen 4 (Vertex)

14¢

$/100 images

SDXL (self-host, idle)

28¢

Effective $/100 at low volume

SDXL (self-host, 500k/mo)

2¢

Amortised at high volume

Self-hosted SDXL flips against hosted at roughly 50,000 assets / month for image. Below that, the GPU idle time eats the savings. Above it, the marginal cost approaches zero and the line steepens.

008B / ENTERPRISE

## Enterprise generative ai, what changes versus an SMB build.

A generative ai agency that ships SMB pilots and an enterprise generative ai practice are not the same shape. The model picks overlap. The pipeline architecture overlaps. What changes is everything around it, procurement, security review, data residency, audit trail, the compliance posture, and the cadence at which the design or content lead can grade an eval set. Most enterprise generative ai engagements we run involve 6 to 10 internal stakeholders, not 2; we plan for that at week one.

-   01
    
    ### Security review
    
    Enterprise procurement typically asks about SOC 2 and ISO 27001 posture, a DPIA if European data is in scope, and increasingly an EU AI Act high-risk assessment. We design within your existing control framework and produce the engagement-side evidence — eval logs, data-flow diagrams, sub-processor lists — that your security review needs.
    
-   02
    
    ### Data residency
    
    Many enterprise generative ai solutions must run inside the client's cloud perimeter, not Replicate, not a hosted API, because training data and prompts contain commercial-sensitive content. That pushes architecture toward Hybrid or Fully-Self-Hosted on Modal-on-your-cloud or a dedicated GPU fleet.
    
-   03
    
    ### Eval cadence
    
    SMB eval cycles can be daily; enterprise cycles are usually weekly, gated by design-team availability and stakeholder review windows. The eight-week Production Build extends to twelve in regulated industries, that's a feature, not a delay. Extra weeks are eval review and security sign-off cycles that compress the post-launch tail.
    

Enterprise tier ships with more documentation, pipeline diagrams, eval rubrics, runbooks, IP posture memos, security control mappings, because procurement and audit teams need the paperwork. The strategy memo includes a procurement-ready vendor map, a capex-vs-opex budget breakdown, a compliance assessment that names the controls, and a roadmap that survives the EU AI Act enforcement-schedule clarification expected in 2027. Delivered for fintech, healthcare, and a publishing-rights-heavy media client; each took the memo into a board pack.

009 / PROCESS

## Eval-first, brand-controlled, eight weeks to a publishable pipeline.

The eval set with the brand rubric lands in week 2, graded with the design or content lead. No generation pipeline ships without one, and the eval set keeps grading after launch, not just before. That's the difference between a generative ai company that talks about quality and one that measures it.

WEEK 1

### Discovery

Output spec (what assets, how many, brand rules, safety posture). Compliance posture, IP, EU AI Act, internal policy. We define the failure modes before the model is picked.

WEEK 2

### Eval set

Graded examples, brand-fit, factual accuracy, format compliance, technical quality. Built with the design or content lead, not invented by us. Drives the model selection.

WEEK 2–4

### Baseline

Multiple models benchmarked against the eval (Flux vs SDXL vs Imagen for image; Sora vs Runway for video; ElevenLabs vs Cartesia for audio). Cost, latency, quality, safety all measured.

WEEK 4–8

### Brand controls

LoRA training where prompt-only fails brand QA. IP-Adapter and ControlNet for compositional control. Constrained prompt library for the editorial surface. Human-approval gates on customer-facing publish.

WEEK 8+

### Deploy

Auth, rate limiting, Langfuse instrumentation, cost ledger, C2PA provenance, watermarking, NSFW classifier. Rollback playbook in the runbook. SOC 2 alignment when the workload touches it.

ONGOING

### Running

Weekly eval review, brand-drift monitoring, monthly cost audit, quarterly LoRA re-training. Ownership transfers to the client's design / content / engineering team.

010 / EVAL

## Four gates. Every pipeline. Every week.

1.  01 Brand-fidelity score
    
    ≥90
    
    Human-graded on a 0–100 rubric per asset type, sampled weekly post-launch. Rubric is built with the design lead, colour palette, composition, typography, brand affordance. LLM-as-judge for sub-scores, but final scores are human-confirmed on a 10% sample. Drift catches LoRA degradation early.
    
    If brand-fidelity drifts under 85 on the weekly sample, we re-baseline the prompt library or re-train the LoRA. Confident off-brand output is worse than a missed deadline because the design team loses trust in the pipeline.
    
2.  02 Safety failure rate
    
    <1%
    
    NSFW classifier on every output, brand-fit scorer, optional human-review gate. Tracked as failures-reaching-publish per 10,000 assets per week. We red-team the pipeline before launch with prompt-injection probes and adversarial inputs, the eval set includes the failure modes, not just the happy path.
    
    Any safety incident reaching a customer surface triggers a rollback to the human-gated pipeline within 24 hours and a root-cause review. We default to human-approval on high-stakes surfaces, it's cheaper than recovering from one published failure.
    
3.  03 Median cost per asset
    
    Modelled at discovery
    
    Per-asset cost, model API spend, GPU minutes, watermarking, classifier passes, tracked weekly in Langfuse. Modelled during the Pilot using the expected output mix and volume. We don't quote averages from marketing decks; we model from the actual eval scenarios.
    
    If median asset cost drifts more than 25% above the baseline for two weeks, we audit the model routing (a quarter of overruns) and the cache hit rate (most of the remaining three quarters). Surprise bills aren't a surprise because the modelling is in week 2.
    
4.  04 P95 prompt-to-publish latency
    
    Under per-surface SLA
    
    Trigger to publishable asset, including queue waits and human-gate dwell where applicable. SLA varies by surface, sub-4s for in-app image, sub-30s for hero campaigns, minutes for cinematic video. We track p50 / p95 / p99 separately because the tails matter for queue sizing.
    
    P95 SLA breach for 72h triggers a routing review on the heaviest model nodes. Usually the fix is moving routine generations to a faster model (Flux schnell or SDXL Turbo), not replatforming the pipeline.
    

011 / COMPLIANCE

## Provenance, watermarking, and the EU AI Act.

Generative AI inherits the IP risk, the data-protection rules, and now the EU AI Act high-risk obligations. Enterprise generative ai engagements either build the compliance posture in, or they get torn out the first time legal reviews the pipeline. We build it in. C2PA at generation, watermarking at publish, audit trail per asset, BAA-ready where healthcare needs it.

SOC-2-ready practices · Continuous monitoring

-   C2PA provenance
    
    Content credentials embedded at generation
    
    AUDITED · 2026
    
-   Watermarking
    
    SynthID + invisible watermark options
    
    AUDITED · 2026
    
-   EU AI Act
    
    High-risk system obligations · phased 2025–2027
    
    AUDITED · 2026
    
-   SOC 2-ready
    
    Practices, not certified
    
    READY
    
-   HIPAA alignment
    
    BAA available for healthcare engagements
    
    READY
    
-   Commercial IP
    
    Flux Pro · Imagen · Firefly · ElevenLabs licensed
    
    AUDITED · 2026
    
-   Audit trail
    
    Per-asset provenance from prompt to publish
    
    AUDITED · 2026
    

012 / USE CASES

## Where teams have shipped.

Three anonymized engagements. Modality, segment, and outcome are real; brand removed under NDA.

Marketing

DTC retail · catalogue at scale

### Brand-LoRA image generation pipeline

Typical shape: custom SDXL LoRA trained on 200–1,000 brand-approved hero images. Pipeline takes product SKU and scene brief, generates variants on Flux Pro with the brand-LoRA, design lead approves before publish. C2PA provenance embedded at generation. Augments hero-variant work the in-house photo studio can't economically produce by hand.

Deliverable: trained LoRA + production pipeline + design-lead eval rubric

Education

Ed-tech · regulated learner audience

### AI-narrated lessons with consent-gated voice

Typical shape: self-paced lessons narrated in a chosen voice with parental opt-in and age-gating. Cartesia TTS for sub-150ms in-app streaming. SSML drives pacing per learning profile. ElevenLabs for the marketing-side teacher previews. Audit log on every narration to satisfy the safeguarding policy.

Deliverable: narration pipeline + consent ledger + safeguarding runbook

Product

B2B SaaS · release-ops shape

### Release-notes automation with brand voice

Typical shape: PR descriptions go in, LLM drafts release notes scored against the team's voice rubric, human edits, publish. Tone consistency measured weekly via a 40-example eval set. Pairs with our llm development services on the text generation side. Not consumer-facing, but the principle is the same, eval-graded before publish.

Deliverable: drafting pipeline + voice rubric + weekly eval dashboard

012B / WHY PAITEQ

## Why teams pick us as their generative ai development services partner.

-   01
    
    ### The eval set lands in week two
    
    Not after launch. Not "we'll figure it out". Not a vibes-check at the end. The design lead grades it, signs it off, and the rubric becomes the contract. Most generative ai agency engagements we audit didn't have one, that's why their pipelines drifted off-brand within a quarter.
    
-   02
    
    ### Cost is modelled at discovery
    
    No per-asset average from a marketing deck. We take your projected volume, output mix, cache hit rate estimate, and GPU idle profile if self-hosting is in play, and produce a defensible cost ledger by week two. Surprise bills aren't a surprise when modelling lives in Langfuse from the start.
    
-   03
    
    ### We name what we don't ship
    
    No long-form coherent video. No live generation. No voice clones without consent. No training data with unclear licensing. No pipelines without a human-review gate on customer-facing publish. The list of things an agency won't ship is the most reliable signal of what they will ship well.
    

013 / TIMELINE

## What the eight-week Production Build looks like.

The standard generative ai development services build, a defined slice ships in eight weeks. Brand-LoRA Training adds 4–6 weeks; Advisory runs in parallel at the front. Pilot is a tighter 2–4 week cut of the same shape.

6 phases

WEEK 1 Discovery

Output spec, brand rules, safety posture, compliance map

Spec sign-off

WEEK 2 Eval set

Graded examples across modalities; rubric authored with design lead

Design-lead grading complete

WEEK 3–4 Baseline

Multi-model benchmark; cost / quality / latency / safety scored

Model picked + costed

WEEK 4–6 Brand controls

LoRA trained (if needed), prompt library locked, human-gate UI wired

Brand-fit ≥ 88 on eval

WEEK 6–8 Pipeline build

Auth, rate limits, provenance, watermark, classifier, runbook

Dry-run scenarios green

WEEK 8+ Launch

Live publish, drift monitoring, weekly eval review

First 30 days of clean traces

014 / ENGAGE

## Four ways to start.

01 Generation Pilot Fixed scope

2–4 weeks

### Pilot one modality, one use case.

In scope

-   One modality, one use case
-   Eval set with brand + quality rubric
-   Working prototype on real data
-   Demo + cost / quality memo

Out of scope

-   Production deploy
-   Brand-LoRA training
-   Compliance posture (separate Advisory)

02 Production Build Fixed scope

8–14 weeks

### Full pipeline, brand controls, observability.

In scope

-   All Pilot deliverables
-   Brand controls (prompt library / LoRA)
-   Human-review gates on publish
-   Provenance + safety + audit trail
-   Four weeks of post-launch iteration

03 Brand-LoRA Training Fixed scope

4–6 weeks

### Custom style LoRA on your assets.

In scope

-   Asset curation and cleaning
-   LoRA training on Flux or SDXL base
-   Eval against design-lead rubric
-   Weights deployed to your infrastructure

04 Generative AI Consulting Fixed scope

2–3 weeks

### Model audit, build-vs-API, roadmap.

In scope

-   Model selection audit
-   Build-vs-API decision framework
-   Compliance and IP review
-   Costed roadmap memo

015 / FAQ

## What buyers ask before signing.

What's the difference between a generative ai development services engagement and an LLM build?

LLMs are text models. Generative AI in our taxonomy means non-text generation, image, video, audio, 3D, multimodal, with text generation handled by our [llm development services](/services/llm-development/) practice. The distinction matters because the engineering shape is different. Image and video pipelines have eval rubrics built around visual quality and brand fidelity, GPU economics that flip differently at scale, IP / copyright risk that pure text doesn't carry, and a human-review gate that's almost always mandatory on customer-facing surfaces.

About a third of our engagements end up multimodal, vision-LLMs read the input, a generation model produces the output. In those, the two practices collaborate. The pillar you land on first depends on whether the dominant value is in the read step or the generate step.

Can you train a brand-controlled model on our assets, and where do the weights live?

Yes, Brand-LoRA Training is one of our four engagement shapes, 4–6 weeks. We start with asset curation and cleaning (usually 200–1,000 brand-approved examples; more if the brand is broad). Then the LoRA trains on Modal or Replicate against a Flux or SDXL base. The output is evaluated against the design lead before deployment, brand-fit score must clear 88 on our internal rubric.

Weights deploy to your infrastructure, not ours. You own the artefact. Re-training cadence is usually quarterly as the brand evolves; we can either retainer that or hand it off to your team. We will not train on assets you don't have clean licensing for, and we document the IP posture in the SOW.

How do you handle IP, copyright, and provenance for generated content?

Three layers. First, model choice, we default to commercially-licensed models (Flux Pro, Imagen 4, Adobe Firefly where the use-case fits, ElevenLabs and Cartesia for audio). Second, provenance, C2PA content credentials embedded at generation, with optional invisible watermarking via SynthID or a partner. Third, audit trail, every asset has a record from prompt to model to publish surface, kept for the retention period in the SOW.

For regulated workloads (EU AI Act high-risk classifications, HIPAA-aligned use-cases), we configure the pipeline to satisfy the obligations and document the assessment in the engagement deliverables. Compliance is a shape we ship, not a checkbox we tick at the end.

Hosted (Replicate / OpenAI / Anthropic) or self-hosted on our cloud?

Hosted to start, almost always. The four engagement patterns answer this, API-first (60% of new pilots), Brand-LoRA, Hybrid, Fully-Self-Hosted. We move to hybrid or self-hosted when volume makes the math flip, regulated data forces it, or you need customisation (specific LoRAs, ControlNet conditioning) the hosted providers don't expose.

The break-even is workload-specific. For image generation, hosted typically beats self-hosted under 50,000 assets per month; above that, the unit economics favour self-hosted SDXL or Flux Dev on a small H100 fleet. For video, hosted dominates in 2026 because the open-weight video models don't yet match Sora or Veo on quality. For voice, Cartesia and ElevenLabs hosted is almost always the right answer, the latency and quality lead is too large.

How do you measure generation quality at scale, and what does failure look like?

Four metrics, one stepper. Brand-fidelity score (human-graded on a 0–100 rubric, target ≥90, sampled weekly). Safety failure rate (NSFW or off-brand outputs reaching publish, target <1%). Median cost per asset (modelled at discovery, drift threshold 25% / two weeks). P95 prompt-to-publish latency (per-surface SLA). All four live in Langfuse alongside the LLM observability for any vision-LLM upstream.

Failure is concrete. Brand drift means re-baseline the prompt library or re-train the LoRA. Safety incident means rollback to the human-gated pipeline within 24 hours plus a root-cause review. Cost runaway is almost always a model-routing or cache issue, not a volume issue. Latency breach almost always solves with a routing change to a faster model on the easy generations, not a replatform.

016 / FURTHER READING

## Where this practice connects.

We ship production pipelines on Flux, SDXL, Sora, Runway, ElevenLabs, and Cartesia. If you're evaluating which generation architecture to commit to, the [flow matching vs diffusion sampling cost](/blog/diffusion-vs-flow-models/) breakdown covers the break-even math for 2026. Most pipelines we audit are sitting on the wrong side of the cost cliff; the [generative AI vendor matrix and pipeline costs](/blog/generative-ai-services-buyers-guide/) covers modality-by-modality architecture and per-asset unit economics before you commit.

A multimodal AI company in 2026 means running vision-LLMs on the input side and generation models on the output side, image-in, decision via Claude, asset-out via Flux. That's where this practice meets our [LLM development services](/services/llm-development/) sibling. When the generation pipeline needs to plan, act, and iterate, not just produce a single asset, the right sibling is our [AI agent development company](/services/ai-agent-development/) practice. The retrieval substrate (when prompts need to ground on brand assets and approved imagery) lives in [RAG development services](/services/rag-development/).

For clients in e-commerce specifically, the [AI for ecommerce](/ai-for-ecommerce/) page covers the SKU-scale catalog enrichment and asset-generation patterns we ship there. Adjacent industry routes: [logistics software development company](/ai-for-logistics/) generative work (freight-doc auto-completion, exception-narrative drafting). The broader engineering context sits on [the Paiteq practice page](/about/), with the full [AI development company](/) story on the homepage.

017 / Related practices

## Adjacent services.

[

LLM DEVELOPMENT

LLM Development

Custom LLM apps — RAG, fine-tuning, evaluation, deployment.

](/services/llm-development/)[

AI AGENT DEVELOPMENT

AI Agent Development

Autonomous, tool-using AI agents for production workloads.

](/services/ai-agent-development/)

018 / Start a project

## Ship *brand-grade* generation in 8 weeks.

Pilot in 2–4. Production Build in 8–14. Brand-LoRA in 4–6. Generative ai consulting in 2–3.

[Talk to engineering](/contact/) [Architecture review](/contact/?topic=arch-review)