AI for Logistics · Logistics Software Development Company

AI for logistics, logistics software development company and AI logistics development company for TMS/WMS-aware orchestration.

Logistics ops in 2026 sit in a three-way squeeze: shipper margin pressure pushing rate per mile down quarter over quarter, ELD-tightened driver capacity that won't loosen because the labor pool isn't there, and cross-border friction (CBP scrutiny, EU AI Act transparency expectations, customs error penalties) that turns a paperwork miss into a multi-week delay. Paiteq is a logistics software development company and an AI logistics development company that builds the AI orchestration layer inside your existing TMS/WMS/ELD/customs stack, McLeod, MercuryGate, Manhattan Active TM, Samsara, Motive, Descartes, CargoWise, wrapping AI route optimization, ETA prediction, AI fleet management, claims, customs, AI warehouse management, and AI in shipping visibility in a layer that makes the stack smarter without replacing it. We stay through the first eval-drift cycle and the first peak-volume push, not the deploy.

Talk to engineering See the 7 use cases

Use cases 7 · routing · ETA · claims · customs · visibility

Engage MVP · Platform · Enterprise

Stack McLeod · Samsara · Descartes · Claude · Pinecone

Risk SOC 2 · FMCSA HOS · C-TPAT · EU AI Act

001 / WHY NOW

Why logistics teams are evaluating AI for logistics right now.

Logistics COOs and VPs of Ops in 2026 face three pressures running in parallel: route-economics drift that no amount of static TMS planning can hold, claims and customs backlogs that absorb headcount the labor market won't replace, and fragmented visibility across project44, FourKites, OEM telematics, and ocean carrier APIs that surfaces problems after the shipper does. Each pressure on its own would be manageable. Together, they're why AI for logistics has moved from R&D experiment to operating-plan agenda since 2024, and why every AI in logistics conversation we walk into now starts with a cost-per-mile question rather than a tech question. The teams shipping logistics AI well aren't replacing McLeod, MercuryGate, Samsara, or Descartes, they're wrapping those primitives with an orchestration layer that makes them smarter, and that's the logistics software development company shape every Round-3 SERP boutique now sells. The framing shift in 2026: AI for supply chain stopped being a Deloitte slide and started being shipped code that tool-calls into the operator's stack.

002 / USE CASES

The 7 highest-ROI AI use cases in logistics.

Below are the seven workflows we see logistics teams build first. They share three traits: each has a clear buyer-readable ROI number in logistics units (cost-per-mile %, OTD points, claims AHT, HS-code accuracy %), each is deployable inside an 8–16 week window, and each compounds when you ship two or three together on a shared TMS/ELD integration layer rather than as standalone bets. The cards are dense on purpose, pain, with-AI workflow, named tools, and the ROI metric in the operator's vocabulary. Skim them, then read the two or three that match where your roadmap actually sits today. We've written about the routing-agent architecture pattern in detail in a separate blog covering ai route optimization at the constraint-solver layer.

USE CASE 01

AI route optimization agent with live constraint re-planning

The Pain

Static route plans built nightly by your TMS fall apart by 10am. Traffic, weather, customer schedule slips, driver HOS limits, and inbound dock conflicts force dispatchers into manual re-routing 4–9 times per driver per day. Cost-per-mile creeps 8–14% above plan; OTD slips 6–11 points on volatile days. Most logistics AI vendors sell you a visibility dashboard and call it AI, that's a chart with a buzzword, not an agent that re-routes.

With AI

A routing agent reads the active state (driver positions, HOS remaining via ELD, dock-door availability via WMS, customer time-windows, weather, traffic) and proposes re-routes the dispatcher confirms with one click. The agent does NOT auto-dispatch, the dispatcher stays in the loop, but cycles move from "rebuild the route in OptimoRoute by hand" to "approve the agent's proposed change". The optimizer underneath is whatever you already license; the agent's job is the reasoning layer that picks which constraint to relax when two plans conflict.

4–8 pts

OTD recovery on volatile days

Cost-per-mile back to within 2–4% of plan; dispatcher load drops from ~85 re-routes/day to ~25 confirmation taps

Tools

LangGraphClaude Sonnet 4.6OR-ToolsOptimoRouteOnfleetRoutificDescartesMcLeodMercuryGateManhattan Active TMSamsaraMotiveGeotab

USE CASE 02

ETA prediction with confidence intervals

The Pain

TMS-quoted ETAs are point estimates with no confidence, customers ask "where's my load?" and your CSR reads a number off a screen that's been wrong for 90 minutes. Customer-facing ETAs miss by ±2–6 hours on long-haul, ±20–45 min on last-mile. The honest take: a point-estimate ETA without an uncertainty band is a lie the CSR has to defend on the phone.

With AI

A forecasting model trained on historical lane performance plus live signals (driver location, fuel stops, dwell at origin, weather, border-crossing delay) produces an ETA plus an 80% confidence interval. CSRs and customers see both numbers; high-uncertainty loads get proactive shipper outreach before the call comes in. The base ETA is classical ML; the customer-facing narrative layer is what turns "±2 hrs uncertainty" into "we're tracking a 2-hour delay risk from Chicago weather".

30–45%

drop in "where is my load?" calls

ETA accuracy improves from ±2–6 hrs to ±35–80 min on long-haul; CSR handle-time per shipment drops 25–40%

Tools

XGBoostLightGBMMLflowClaude Haiku 4.5SnowflakeBigQueryproject44FourKites

USE CASE 03

Claims triage and FNOL automation

The Pain

Cargo damage and loss claims take 5–14 days from FNOL to first carrier response. Each claim file is ~25 documents (BOL, POD with damage photos, customer statement, carrier denial letter, recovery quote). Claims clerks spend 60–90 min per claim assembling the file before adjuster review, and roughly half that time is re-keying data that already lives in your TMS.

With AI

An agent reads the inbound claim notification (email, EDI 998, photos), classifies the damage type, pulls the matching BOL and POD from your TMS, drafts the carrier demand letter, and surfaces the file with a recommended liability split for the claims adjuster. Adjuster approves or edits, no auto-decisions on liability. The RAG layer over historical claims gives the adjuster precedent matching on similar lanes and similar damage patterns, which is where recovery rate moves.

1–3d

FNOL → first carrier response (from 5–14d)

Clerk time per claim drops 60–90 min → 8–15 min; recovery rate on contested claims up 4–9 points

Tools

Claude Sonnet 4.6GPT-4 VisionPineconeTurbopufferCarrier411TIA-DAT

USE CASE 04

AI warehouse management, slotting and pick-path optimization

The Pain

WMS slotting decisions (which SKU goes in which bin) get rebuilt quarterly by an analyst running a spreadsheet model. Between rebuilds, velocity drifts and pick paths grow 12–22% longer than optimal. On peak days, pickers walk 8–12 miles per shift unnecessarily, and the labor cost compounds across every dock door the facility runs.

With AI

A recommender continuously scores SKU-to-bin assignments against live velocity, seasonal forecasts, and physical zone constraints. It surfaces "move these 80 SKUs this weekend, expected pick-path saving 14%" recommendations for the WMS supervisor's review, the agent does NOT auto-rebalance the warehouse. The narrative layer explains WHY a SKU moved up the priority list so the supervisor can sanity-check before approving the wave.

10–18%

pick-path length reduction

Picker miles per shift drop 1.5–3 miles; SKU velocity-fit holds within 8% of optimal between quarterly resets

Tools

LightGBMMLflowClaude Sonnet 4.6Manhattan Active WMBlue YonderKörberSofteon

USE CASE 05

Customs documentation and HS-code accuracy

The Pain

HS-code classification for cross-border shipments is wrong 6–14% of the time at mid-volume freight forwarders. Each misclassification triggers either a CBP query (1–3 weeks delay), a duty overpayment (lost margin), or an underpayment (penalty plus audit risk). Customs brokers manually re-key data from BOL and commercial invoice into ABI/AES every shipment, and the re-key step is where 60% of the errors enter the system.

With AI

A classifier reads the commercial invoice plus product description plus prior shipment history, proposes an HS code with a confidence score, and pre-fills the customs entry. The broker reviews high-confidence entries in seconds; low-confidence entries surface for manual research with a side-by-side history of similar prior classifications. EU AI Act-era transparency expectations (Art. 50 set the baseline): every AI-determined HS code logs with a reasoning trail for audit. The broker stays the decision-maker; the agent just removes the typing.

1.5–3%

HS-code error rate (from 6–14%)

Broker time per entry compresses 8–14 min → 90 sec on high-confidence cases; duty overpayments recovered, penalty exposure down measurably

Tools

Claude Sonnet 4.6PineconeTurbopufferCargoWiseDescartes OneViewWiseTech

USE CASE 06

Multi-modal supply-chain visibility and anomaly detection

The Pain

Visibility data (project44, FourKites, MacroPoint, OEM telematics, ocean carrier APIs, rail tracers) lands in 6–9 separate systems. Operators only catch a delay when it's already a delay, by the time the shipper calls, the truck has been sitting 4 hours. Dashboards aren't a strategy; they're a record of what already broke.

With AI

An anomaly-detection layer reads the unified visibility feed, scores deviations against expected lane behavior, and surfaces "this rail container is 9 hours behind its 90% interval, likely Chicago yard congestion, recommend reroute via Memphis" exception alerts. Ops triages the top-N exceptions instead of watching dashboards. The unified ingest layer can be your existing visibility provider or a Kafka and Snowflake pipeline; the AI in shipping ops sits on top of whichever you've already paid for.

~25 min

time-to-detect on delayed shipments (was ~4 hrs)

Reactive customer escalations down 35–55%; on-time-in-full (OTIF) improves 3–7 points across the lane portfolio

Tools

Claude Sonnet 4.6SnowflakeLightGBMproject44FourKitesMacroPoint

USE CASE 07

AI fleet management, driver-coaching and safety-event triage from ELD and dashcam data

The Pain

Safety managers review dashcam plus ELD events (hard braking, swerving, drowsiness flags, speeding) for hundreds of drivers per week. False-positive rate on triggered events is 40–65%, so coaches dismiss most of them, and miss the real risk events buried in the noise. The drivers who actually need coaching get the same dismissed-event treatment as the drivers who don't.

With AI

A triage layer classifies each event (true safety risk, coachable behavior, sensor false-positive), prioritizes the queue, and drafts the coaching note plus suggested conversation framing. The safety manager reviews the top-priority queue daily instead of all events weekly. The vision model reads the dashcam clip for context the ELD's accelerometer can't capture, a hard-brake event is very different when the dashcam shows a pedestrian dart-out versus a tailgating pattern.

10–18%

false-positive rate (from 40–65%)

Safety-manager time per driver per week drops from ~22 min → ~6 min; preventable-event recurrence down 18–28% within 90 days

Tools

GPT-4 VisionClaude Sonnet 4.6SamsaraMotiveLytxNetradyneKeepTruckin

A pattern worth flagging across all seven AI in logistics workflows above: the ROI numbers are the median of what we and similarly-shaped boutiques have shipped, not the headline outlier. Don't pick a use case for its ceiling. Pick the two with the cleanest buyer-readable ROI math for your operating model, asset-based carriers with a dispatch overload start with UC-1 and UC-2; 3PLs with a claims backlog start with UC-3 and UC-2; freight forwarders with cross-border exposure start with UC-5 and UC-6. The next section maps each pain to the Paiteq service that does the actual engineering. The cluster keywords, ai fleet management, ai warehouse management, ai supply chain visibility, get their own deeper blog treatment; this pillar is the AI for supply chain orchestration view, not the per-workflow deep-dive.

003 / SERVICE MAPPING

How Paiteq services map to logistics needs.

Four common logistics pain shapes on the left, five Paiteq service pillars on the right. Hover any pain row to highlight which services we'd engage; hover a service to reverse-highlight the pains it solves. The descriptive anchors (not the service primary keyword) are deliberate, what matters to the operator is the workflow, not the service title.

Route-economics drift and dispatcher overload

Cost-per-mile drifts 8–14% above plan on volatile days; dispatchers manually re-route 4–9 times per driver per day against TMS plans that go stale by 10am.

ETA accuracy and CSR escalation load

Point-estimate ETAs miss by ±2–6 hrs on long-haul; CSRs absorb the customer call volume and have nothing to defend the number with.

Claims processing and customs accuracy drag

FNOL → first response runs 5–14 days; HS-code error rate sits at 6–14%. Both are document-heavy workflows AI can compress without auto-deciding.

Fragmented visibility across modes

project44, FourKites, OEM telematics, and ocean carrier APIs sit in 6–9 separate systems. Operators see the delay after the shipper does.

Service

AI Agent Development

orchestrating multi-step routing and claims-triage agents

Service

RAG Development

grounded retrieval over BOLs, customs declarations, and lane history

Service

AI Workflow Automation

automating the dock-scheduling and claims-FNOL workflows that sit between your TMS and your back-office

Service

Machine Learning Development

training ETA, demand-forecasting, and anomaly-detection models on your historical lane data

Service

AI Integration

drop-in AI integration into McLeod, MercuryGate, Manhattan Active TM, Samsara, and Descartes stacks

Why the map looks like this

Building logistics AI in 2026 is genuinely a multi-discipline engineering job, closer to platform integration work than to a typical TMS-customisation build. Route-economics drift routes to three services because a working routing agent is partly orchestrating multi-step routing and claims-triage agents, partly automating the dock-scheduling and claims-FNOL workflows that sit between your TMS and your back-office, and partly drop-in AI integration into McLeod, MercuryGate, Manhattan Active TM, Samsara, and Descartes stacks. ETA accuracy routes to ML plus RAG plus integration because a confidence-interval ETA isn't a single LLM call, it's a base forecasting model, a historical-lane retrieval step, a live-signal join (ELD position, weather, dwell), and a customer-facing narrative layer.

Claims and customs workflows route to agent work plus grounded retrieval over BOLs, customs declarations, and lane history plus workflow automation because the FNOL flow and the HS-code flow are both document-heavy decision pipelines, the AI's job is to read 25 documents and surface the right answer with audit trail, not to make the liability or classification call. Fragmented visibility routes to training ETA, demand-forecasting, and anomaly-detection models on your historical lane data plus integration plus agent work because the anomaly model has to ingest the unified feed (project44, FourKites, OEM telematics), score deviations against expected lane behavior, and turn the result into an exception narrative the ops team can act on. The discipline split isn't bureaucracy, it's how the engineering stays high-quality across a 20-week Platform build with dispatch, claims, and IT all watching the same use case land.

004 / RISK

Operational risk and data posture for logistics.

Three risk layers shape every logistics AI engagement we run. SOC 2 Type II plus shipper-data posture is the B2B procurement gate, shippers won't let an AI vendor touch their TMS data without the attestation. FMCSA Hours-of-Service rules (49 CFR Part 395) govern any AI that recommends or influences a dispatch decision. C-TPAT plus customs accuracy under CBP, reinforced by the EU AI Act transparency regime (Art. 50 set the baseline) that's raised the bar for how AI-generated customs declarations need to be documented, closes the cross-border loop. The logistics buyer's gate is shipper-data posture plus DOT/FMCSA fleet rules plus cross-border customs accuracy, not regulator-driven privacy in the SaaS or fintech sense.

SOC-2-ready practices · Continuous monitoring

SOC 2 Type II

Shipper-data posture · per-tenant partitioning

AUDITED · 2026
FMCSA HOS

49 CFR Part 395 · ELD read-only ingestion

AUDITED · 2026
C-TPAT + EU AI Act transparency

Customs accuracy · AI reasoning trail logged

AUDITED · 2026

SOC 2 TYPE II

Shipper-data posture

Shippers (CPG brands, retail, manufacturing) require SOC 2 attestation before any AI vendor touches their TMS data, customer addresses, or lane pricing. Volume-discount pricing is competitive intel, a leak tanks the shipper relationship. We design AI features so shipper data never leaves your VPC: vector stores in Pinecone or Turbopuffer are partitioned per shipper-tenant; embeddings never cross tenants; observability logs in Langfuse redact PII (consignee names, addresses, phone numbers) at the logging layer, not as a post-hoc scrub. The engagement signs DPA plus SOC 2 attestation review at kickoff. Most shipper procurement teams we engage with already run a clean SOC 2 environment; our job is to design the AI work so the scope doesn't expand and the attestation holds at next year's audit.

FMCSA HOS · 49 CFR PART 395

ELD and Hours-of-Service posture

Any AI that recommends or influences a dispatch decision intersects with FMCSA HOS rules (49 CFR Part 395). Recommending a route a driver legally can't run is a compliance exposure plus a driver-pushback problem plus a safety risk all at once. The routing agent's constraint layer enforces live HOS remaining per driver, pulled read-only from your ELD (Samsara, Motive, Geotab), the agent CANNOT propose a route that requires more drive time than the driver legally has available. The dispatcher confirms every recommendation; the agent never auto-dispatches. ELD ingestion stays read-only, we never write back to the ELD record, which preserves the audit trail DOT can subpoena under a roadside or compliance review. Most logistics AI vendors miss this; we make it the architecture gate at week 3.

C-TPAT + CBP + EU AI ACT ART 50

Customs accuracy and AI transparency posture

Cross-border freight forwarders and 3PLs operate under C-TPAT (Customs-Trade Partnership Against Terrorism), AI-determined classifications, denied-party screening results, and documentation must all support a CBP audit. The EU AI Act transparency regime (Art. 50 set the baseline) has raised the bar for how AI-generated customs declarations and HS-code rationales for EU-bound shipments need to be documented. Every AI-determined HS code, screening result, and customs-form pre-fill logs with the reasoning trail plus confidence score plus the reviewing broker's signature. High-confidence entries auto-route to the broker queue with the AI rationale visible; low-confidence entries surface for manual research with side-by-side history of similar prior classifications via RAG over CargoWise or Descartes OneView data. The audit trail is what makes the AI feature defensible to CBP and EU customs, without it, the AI is a liability, not a tool.

005 / ENGAGEMENT

How a logistics AI engagement runs at Paiteq.

Five phases. Every phase has an explicit deliverable, a named owner inside your team, and a gate criterion that has to pass before the next phase starts. The cadence is weekly: a Monday standup with your VP Ops, Dispatch or Claims lead, IT lead, and Safety or Compliance contact. Demo every Thursday. SOC 2 shipper-data posture, FMCSA HOS constraint design, and C-TPAT audit-trail design all track in parallel from week 1, not as a retrofit at security review.

Logistics AI Engagement · 16 weeks (typical Platform tier) 5 phases

WEEK 1–2 Discovery

Use-case prioritisation, route-economics or claims-AHT ROI scoping, stakeholder map (VP Ops + Dispatch lead + Claims manager + IT)

Single buyer-readable ROI number scoped per use case (cost-per-mile %, OTD pts, claims AHT, or HS-code accuracy %)

WEEK 3–4 Architecture + Risk Scoping

Stack lock against your TMS/ELD/WMS, SOC 2 shipper-data posture review, FMCSA HOS constraint design, C-TPAT audit-trail design

Architecture signed by your ops lead and your safety/compliance contact before any prompt is written

WEEK 5–10 MVP Build

Runnable agent against eval set plus your real lane data, weekly demo, observability via Langfuse, HOS-constraint enforcement wired in

Baseline accuracy hit on eval set; ELD ingestion read-only; AI reasoning trail logging in place

WEEK 11–16 Production + Peak Readiness

Hardening, fallback policies for ELD/visibility-feed outages, rollout, runbook for peak-volume days (produce season, holiday push)

All eval gates green; peak-day load tested at 3× expected volume; dispatcher confirmation flow validated

WEEK 17+ Optimise + Handoff

Cost engineering, prompt iteration, runbook in your repo, eval-drift monitoring, ownership transfer to your team

Two cadence notes for logistics specifically

The dispatch or claims lead shows up week 1, not week 8. Half the use cases on this page, UC-1 routing agent, UC-2 ETA prediction, UC-3 claims triage, depend on decisions that are genuinely operational decisions, not engineering ones (which constraints relax first when two plans conflict, what the recovery-rate threshold is for an auto-drafted demand letter, when the agent stops and waits for a human). We've found the first-week unblock is almost always getting the operating lead into the architecture conversation before the stack is locked, because changing the constraint hierarchy or the human-in-loop threshold at week 6 costs 2–3× what it costs to design it in at week 1. The second cadence note: peak-volume readiness lands at week 11–16, not after launch. Produce-season pushes, holiday surges, and end-of-quarter customer pulls are real deadlines for asset-based carriers and 3PLs; we pre-test the routing agent and the ETA model at 3× expected load before sign-off so the first peak day isn't the first stress test.

006 / TEAM SHAPE

Team shape for a logistics AI engagement.

Two engagement shapes cover roughly 80% of the logistics AI work we run across 3PLs, asset-based carriers, and freight forwarders. MVP for a single high-clarity use case with the TMS/ELD integration scaffolding sized accordingly; Platform for the multi-use-case build on shared infra that most operators in the 50M–500M revenue band actually need. Enterprise tier (4 engineers, 3 ML engineers, 1 PM, 30+ weeks) sits behind these for org-wide AI orchestration across routing, claims, customs, and visibility simultaneously. As a logistics software development company we don't pretend the MVP shape is the right answer for everyone; it's a stepping stone for half our clients and a stop point for the other half.

	MVP shape, one use case	Platform shape, 3–5 use cases on shared infra
Scope	One use case shipped to production (e.g. UC-2 ETA prediction or UC-3 claims triage)	3–5 use cases on shared TMS/ELD integration layer
Team shape	2 eng + 1 ML + 0.5 PM	3 eng + 2 ML + 1 PM
Timeline	8–12 weeks	16–24 weeks
Engagement shape	1 use case, 2 eng + 1 ML + 0.5 PM	3–5 use cases on shared TMS/ELD layer, 3 eng + 2 ML + 1 PM
Logistics MVP carries heavier integration scaffolding than ecommerce because Samsara, McLeod, and Descartes don't expose the same plug-and-play surface that a Shopify Storefront API does. Platform tier is the median right answer for 3PLs and asset-based carriers in the $50M–$500M revenue band. The Enterprise tier (4 eng + 3 ML + 1 PM, 30+ weeks) only fits when the engagement is genuinely org-wide AI orchestration across routing, claims, customs, and visibility simultaneously. Specific engagement sizing comes out of the audit conversation.
Eval framework	Single eval set, 30–50 lane examples	Shared eval harness across use cases, regression alarms in CI on every model release
Observability	Langfuse traces + cost dashboard	Langfuse + per-use-case cost attribution + ELD/visibility-feed outage monitoring
Stop-and-walk option	Yes, fixed scope, real option to stop after week 8	Phased gates at weeks 4 / 10 / 16; can collapse to single-use-case build mid-flight

Specific engagement sizing comes out of the audit conversation. Enterprise tier scoped separately on request.

The cheapest tier isn't the cheapest outcome

If you're shipping more than one AI use case in the next 12 months, and most logistics teams that get to a serious AI strategy will, the MVP tier asks you to rebuild the TMS/ELD integration layer, the eval framework, and the observability stack twice. The second rebuild costs more than the first. Platform tier is the median right answer for 3PLs and asset-based carriers in the 50M–500M revenue band because the shared infra (eval harness, ELD adapters, RAG layer over historical lanes and claims, model routing, observability via Langfuse) amortises across three to five use cases instead of one. As a logistics software development company we run the MVP tier for two real cases: pre-scale operators testing whether logistics AI pays back at all, and freight forwarders with a single high-clarity workflow (usually HS-code classification) they want to ship in 10 weeks before greenlighting the platform investment. Both are legitimate; neither is most companies.

007 / WORK

What we've shipped for logistics teams (anonymized).

Three anonymised logistics engagements from the broader team's history. Segment and revenue band are real; metrics are real; the numbers were measured 60–90 days post-launch, not at deploy. Brand names removed under standard NDA. Anyone selling you headline outliers without the operating numbers under them is selling case-study theatre.

Dispatch

Asset-based carrier · $180M revenue · NA

Routing agent + HOS-aware re-planning

A 480-truck fleet with dispatchers re-routing 6–8 times per driver per day against a nightly TMS plan that went stale by mid-morning. We shipped a routing agent (LangGraph plus Claude Sonnet 4.6 with OR-Tools underneath, tool-calling against McLeod and Samsara) with hard HOS-remaining constraints pulled from the ELD in read-only mode. Cost-per-mile drift dropped from 11% above plan to 3% on volatile days; OTD recovered 5.4 points; dispatcher confirmation taps replaced ~82% of manual re-routes.

0 %

Cost-per-mile drift → 3% / 90d post-launch

Claims

Mid-market 3PL · $90M revenue · NA

Claims-triage agent + RAG over precedent files

Cargo damage claims sat at 9.2 days average FNOL → first carrier response with ~24 documents per file. We shipped a triage agent (Claude Sonnet 4.6 plus GPT-4 Vision on damage photos) with RAG over five years of historical claims via Pinecone, tool-calling into the TMS for BOL/POD pulls. FNOL → first response compressed to 2.1 days; clerk time per claim dropped from 74 min mean to 12 min; recovery rate on contested claims up 6.8 points.

FNOL → first response 9.2d → 2.1d

Customs

Freight forwarder · $220M revenue · EU + NA

HS-code classifier + ABI/AES pre-fill

HS-code error rate sat at 9.4% across ~14,000 cross-border entries per month, with brokers re-keying invoice data into CargoWise on every shipment. We shipped a classifier (Claude Sonnet 4.6 plus RAG over historical declarations on Turbopuffer) with confidence-thresholded human review and an EU AI Act-era reasoning trail logged per entry. Error rate dropped to 2.3%; broker time per high-confidence entry dropped from 11 min to 85 sec; duty-overpayment recovery covered the engagement cost inside 7 months.

0 %

HS-code error → 2.3%

The shape across all three engagements

The buyer-readable ROI metric was scoped in week 2, before any code was written, cost-per-mile drift target on the carrier engagement, FNOL-response target on the 3PL engagement, HS-code error-rate target on the freight-forwarder engagement. The eval set grew during production via sampled traces, not a static set left over from architecture. Handoff put the runbook in the client's repo, not in a shared doc. We engage as a logistics software development company that stays through the first peak-volume push and the first eval-drift cycle, not one that ships and disappears. Roughly half of the logistics AI engagements we close convert to a lighter-weight Run engagement after the build is in production; half don't, because the client's internal team has picked up ownership. Both outcomes are fine. The Run engagement is real work, prompt iteration, cost engineering, peak-day load testing, regression testing on new model releases, not a retainer hiding as a service.

Where logistics AI connects.

Most logistics engagements ship as AI automation agency work (EDI handling, three-way invoice match, freight-doc OCR + extraction) wired to RPA development services for the legacy-system bridges. Decision-heavy nodes (exception triage, dynamic rerouting) move to AI agent development company practice.

The model layer underneath route-optimization and demand-forecast lives in machine learning development services with operations in MLOps services. RAG development services handles carrier-doc knowledge surfaces. Customer-facing tracking and exception-resolution chat ship as chatbot development services. Strategic framing: AI consulting services. Founder context: Navin Sharma; broader AI development company.

008 / FAQ

Logistics AI buyer FAQ.

Five questions we get on almost every logistics AI first call, answered the way we'd answer them on the call. Specific numbers, named tools, the actual decision rules, not generic vendor-deck answers.

How do you size an AI engagement on top of our TMS / WMS stack?

Three shapes. An MVP build of a single AI use case, ETA prediction, claims triage, or HS-code classification, runs 8–12 weeks with 2 engineers, 1 ML engineer, and 0.5 PM. A Platform build covering 3–5 use cases on a shared TMS/ELD integration layer runs 16–24 weeks with 3 engineers, 2 ML engineers, and 1 PM. Enterprise engagements with org-wide AI orchestration across routing, claims, customs, and visibility run 30+ weeks with 4 engineers, 3 ML engineers, and 1 PM. Logistics MVP needs heavier integration scaffolding than ecommerce because TMS and ELD stacks (McLeod, MercuryGate, Manhattan Active TM, Samsara, Motive) don't have plug-and-play surfaces, and most engagements spend 30–40% of the MVP effort just on the integration layer. Most logistics AI work that ships well sits in the Platform tier because the shared infra (eval harness, model routing, TMS/ELD adapters, observability via Langfuse) amortises across the use cases instead of getting rebuilt every project. Specific sizing comes out of the audit conversation, start there.

Build vs. buy: when does in-house AI orchestration beat a visibility-platform vendor (project44, FourKites)?

Buy when the AI feature is genuinely commodity, basic shipment tracking, off-the-shelf ETA, generic dashboard reporting, and the visibility vendor's data already covers your lane mix. Build the orchestration layer when AI touches your differentiated dispatch decisioning, your claims recovery, or your customs accuracy. Routing-agent re-planning with your live HOS constraints (UC-1), claims-triage with RAG over your precedent files (UC-3), and HS-code classification tuned to your product mix (UC-5) aren't workloads a visibility vendor will build for you, they sell platform breadth, not your specific decisioning. We've watched a $200M asset-based carrier buy three visibility tools in two years, realise the tools didn't compose into a dispatch decision, and rebuild on a clean orchestration layer that cost less than the third tool's annual license. Our drop-in AI integration into McLeod, MercuryGate, Manhattan Active TM, Samsara, and Descartes stacks sits at the decisioning layer, not at the tracking layer, your existing visibility license stays where it earns its keep.

How do you handle FMCSA HOS compliance when an AI agent is recommending dispatch decisions?

The routing agent's constraint layer includes live HOS-remaining per driver, pulled read-only from your ELD (Samsara, Motive, Geotab). The agent CANNOT propose a route that requires more drive time than the driver legally has available under 49 CFR Part 395, the hard constraint is enforced before the LLM ever sees the candidate plan, not after as a post-hoc filter. The dispatcher confirms every recommendation; the agent never auto-dispatches. ELD ingestion is strictly read-only, we never write back to the ELD record, which preserves the audit trail DOT can subpoena. The opinionated take most logistics AI vendors skip: an AI that recommends a route a driver legally can't run isn't a compliance edge case, it's a product defect. The HOS constraint isn't a feature you bolt on at week 14; it's the architecture decision at week 3.

Which AI use cases have the highest ROI for a mid-market 3PL or asset-based carrier (50M–500M revenue band)?

The four highest-ROI starting points we see in 2026 are: ETA prediction (UC-2, 30–45% drop in "where is my load?" calls, CSR handle-time down 25–40%), claims triage (UC-3, FNOL → first response 5–14d → 1–3d, clerk time 60–90 min → 8–15 min, recovery rate up 4–9 points), routing-agent re-planning (UC-1, cost-per-mile back to within 2–4% of plan, OTD recovers 4–8 points), and for cross-border operators, HS-code classification (UC-5, error rate 6–14% → 1.5–3%, broker time 8–14 min → 90 sec on high-confidence entries). Pick the two with the cleanest buyer-readable ROI math for your operating model and let the eval data tell you which use case is next. Asset-based carriers usually start with UC-1 and UC-2 because the dispatch leverage shows up in week 12; 3PLs with brokerage volume start with UC-2 and UC-3 because the CSR and claims load is what's burning the team out. Trying to ship all four at once is how logistics AI engagements stall, too many ops approvals running in parallel and the team loses the plot by week 14.

How long until we see cost-per-mile or OTD improvement?

Honest answer: 10–16 weeks from kickoff for the first measurable cost-per-mile or OTD lift on a single use case, and the lift compounds for another 2–3 quarters as eval data tightens the routing-agent's constraint weights. The fastest single-use-case wins we've shipped: ETA prediction at 8 weeks to first measurable accuracy improvement on long-haul lanes; claims-triage at 9 weeks to first FNOL-response delta. The slower wins: routing-agent re-planning (UC-1) and HS-code classification (UC-5), which both need 12–16 weeks before the eval set covers enough lane variability or product variability to trust the agent's outputs without heavy dispatcher or broker review. The pattern we won't promise: cost-per-mile lift inside 6 weeks. Anyone selling that number hasn't actually shipped a routing agent against a live HOS constraint. Training ETA, demand-forecasting, and anomaly-detection models on your historical lane data in week 1 is where the timeline either gets real or stays fictional, and we'd rather scope conservatively and beat the timeline than promise a number that needs a VP-Ops conversation in week 10.

009 / START A LOGISTICS AI ENGAGEMENT

Book a discovery call. We'll name the two AI features that'll move cost-per-mile or OTD and quote a build window.

No deck. Forty-five minutes with an engineering lead, your real operating context on the table, and a follow-up memo within 48 hours scoping the MVP or Platform tier sized to your lane mix and fleet shape.

Talk to engineering See the 7 use cases again

010 / OTHER INDUSTRIES

Adjacent industries we engage.

Logistics sits next to three industries in our book where the AI build patterns rhyme, sometimes the workflow translates directly, sometimes the data posture changes the engineering. Brief signposts; full pillars land as each ships.

INDUSTRY · SAAS

AI for SaaS

Sales agents, RAG copilots, churn prediction, embedded product AI.

INDUSTRY · FINTECH

AI for Fintech

KYC, fraud detection, model-risk governance under SR 11-7.

INDUSTRY · ECOMMERCE

AI for Ecommerce

Catalog enrichment, conversion-side search, recommendations.