Claude Sonnet 4.6
AnthropicLong-context · tool use · default pick
Anthropic developers, Claude consultants, and Claude Code experts who ship production AI: long-context agents, tool-using workflows, Computer Use, Claude RAG, and Claude Code engineering. Model-agnostic. Daily operators. We'll show you the token-cost math before you commit.
Claude development is the practice of building applications against Anthropic Claude API: Sonnet 4.6 for synthesis and Haiku 4.5 for cheap routing and classification. It uses Claude native tool-use schema, forced-JSON outputs via response_format, 200K-token context windows, and prompt caching that can cut input cost up to 90 percent on repeated system prompts. Unlike single-vendor SaaS resale, Claude development covers the full integration: model selection per workflow, deployment on Claude Console or AWS Bedrock with PrivateLink for HIPAA and FFIEC posture, and Constitutional AI guardrails that make refusals predictable rather than ad-hoc. A common router pattern uses Haiku 4.5 to classify intent and Sonnet 4.6 to synthesise the grounded answer, with Langfuse for observability.
From chat to agents to repo-scale code work, these are the patterns we ship most often. Every one of them comes with an eval suite, audit logging, and a token-cost target. Not a demo.
Customer-facing or internal chat built on Claude's tool-use API. Multi-turn dialogue, context memory, structured outputs, escalation paths. Deployed in web, mobile, Slack, Teams, or your own front-end.
Production agents that plan, call tools, observe results, and recover from errors. ReAct, plan-and-execute, hierarchical agent patterns. LangGraph orchestration or custom Python. We pick the simpler one that works.
Full-contract review, multi-document synthesis, repo-scale code analysis, regulatory comparison. Claude's long context unlocks workflows you couldn't ship on GPT-4's 128K, without the chunking pain.
Production Claude RAG: retrieval-augmented agents over Notion, Drive, Confluence, your CRM, or your code. Pinecone, pgvector, or Weaviate retrieval, eval-tested with your real questions before launch. Claude's 200K context lets us cite full source passages instead of chunks.
Production claude computer use for browser and desktop automation: form-filling, legacy-system workflows, complex UI tasks without API hooks. Honest about where Computer Use is and isn't production-ready in 2026.
Internal AI engineering for your team using Claude Code: code generation, repo-scale refactoring, test authoring, on-call triage. Claude code tutorial onboarding for your engineers plus production-grade claude code agents (custom subagents tuned to your codebase). We dogfood this for our own engineering, daily.
Some teams need a Claude consultant to sort the strategy first: which model, which deployment, which workflow. Others know what they want and just need Anthropic developers to build it. We do both, picked by what stage you're at. Either way, fixed-fee at the start.
You're not sure whether Sonnet 4.6 / Haiku 4.5 / Opus 4.7 fits the workflow, whether Computer Use is the right call, or whether Bedrock vs Anthropic-direct is the right deployment. We run a fixed-fee Anthropic consulting audit, deliver a ranked roadmap, and you decide whether to build with us or in-house.
You know what you want shipped. We build one Claude workflow end-to-end against your real systems in 4–6 weeks: agents, RAG, Computer Use, or Claude Code engagement. Fixed-price. Eval suite, monitoring, and a runbook ship with it. Walk-away point if the data doesn't move at the pilot stage.
You have a roadmap of 3–5 Claude workflows. Embedded Anthropic developers ship them on cadence, with monthly cost-of-ownership and drift reporting. Includes Claude Code consulting for your engineering team. Cancel any month.
The honest anthropic vs openai comparison, drawn from shipped client work rather than benchmark leaderboards. Both are production-ready. We pick per workflow, not per vendor.
Generalizations from shipped client work + public benchmark suites (HELM, SWE-bench, GAIA). Specifics vary per workload; we benchmark on your eval before recommending.
The Claude family covers three price/quality bands. The default is Sonnet 4.6 for most workloads. Haiku is ~3× cheaper for narrow tasks; Opus 4.7 is only ~1.7× more expensive than Sonnet (a big drop from the old Opus 4 pricing), which changes when Opus is worth the spend. Here's how we choose.
Prices reflect Anthropic API list pricing as of 2026; Bedrock pricing tracks within ±10%. Latency from typical production traces.
Claude Sonnet 4.6's 200K-token context isn't a benchmark stat. It changes which workflows are even possible. Three we now ship we now ship without chunking, retrieval errors, or per-section state-juggling.
Past 100K tokens, you can put an entire master services agreement + every amendment + the playbook of redlined clauses into a single Claude call. No chunking, no retrieval errors, no "the model missed clause 14(b) because it was on page 12."
Drop a 150-file Python service into the prompt and ask Claude to find every place a deprecated function is called. We've shipped this for an internal devtools client: found 41 call sites, zero false positives.
Compare three regulatory filings against your current policy doc. Or merge five meeting transcripts into a single decision log. Or summarize 200 customer interviews. All in one prompt, with citations back to the source.
Three tactics stacked. Each one independently saves money; together they typically bring effective token cost to 10–15% of the naive baseline — at the same eval-suite quality.
The number-one complaint about Claude in production: "the bill ran away." The fix is rarely "use a worse model." Three claude cost optimization tactics we apply in every pilot, and report on monthly afterwards.
Most workflows have 70% easy decisions and 30% hard ones. We route easy decisions to Haiku ($1/$5 per M) and only escalate to Sonnet when needed. Typical result on the same workflow: 60–80% cost reduction with zero quality drop on the eval suite.
Anthropic's prompt caching cuts cached prefix tokens to 10% of input cost. We restructure prompts so the system prompt + tool definitions + long context are cacheable. On document workflows we see effective input cost drop by 70–85% within a week.
Long-running agents bloat their own context. We add summarization layers that compress old turns into 200-token gists, so a 12-step agent runs in ~8K tokens instead of 40K. Same quality, 5× cheaper, faster latency.
A gpt to claude migration that scares teams least: prompts + tool-use + eval suite. Three weeks of work, milestone-billed, with a walk-away point if the data doesn't move. Most teams see a 30–60% token-cost reduction after the optimization pass.
We rebuild your eval set from existing GPT outputs, audit every system prompt for Claude-style instructions, and map your function-calling schema to Claude's tool-use format.
Most GPT-4 workloads map to Sonnet 4. We benchmark on your eval to confirm; sometimes Haiku is enough, sometimes Opus is required. You see the data, not our opinion.
We run Claude in shadow mode alongside your GPT pipeline. Same inputs, both outputs logged. You see the quality + cost delta on real traffic. Cut over only when the data shows parity or better.
Once live, we run the token-optimization playbook: prompt caching, complexity routing, context compression. Most teams see another 30–60% cost reduction in the first month after cutover.
The same ticket-triage agent across Sonnet, Haiku, and Opus. Pick a model on the left and the model= line swaps; the per-ticket cost stat updates. This is how we choose a model: by running your eval, then looking at the bill.
from anthropic import Anthropic
client = Anthropic()
@tool(description="Search internal product docs")
def search_docs(query: str) -> list[dict]:
return vector_db.search(query, k=5)
@tool(description="Create a Zendesk reply (pending review)")
def reply(ticket_id: int, body: str) -> dict:
return zendesk.update(
ticket_id, body=body, status="pending"
)
def triage(ticket: dict) -> dict:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
tools=[search_docs, reply],
messages=[{
"role": "user",
"content": format_ticket(ticket),
}],
)
return run_tool_loop(response, ticket)
The right deployment depends on which compliance regime you need to satisfy. We've shipped all three, walked the security teams through each, and provide a DPIA template at the audit stage.
Anthropic offers BAA on Claude Enterprise. We deploy with audit logging, no-train-on-data toggle, and a 30-day retention default you can shorten to zero. We walk your security team through the architecture before code ships.
For workloads that need AWS's compliance posture, we deploy Claude through Bedrock with PrivateLink, KMS encryption, and CloudTrail logging. Your data never leaves your VPC. Same Claude model, AWS-managed control plane.
Anthropic's EU data residency option for Claude API and Bedrock's eu-central-1 region cover the most common EU compliance asks. Data Processing Agreements available. We provide a DPIA template at the audit stage for your privacy team.
Same pricing as our other engagements. Most clients begin with the audit to scope, run a 4–6 week pilot on the highest-ROI workflow, then move to monthly for the next 3–5.
Find the Claude workflows worth shipping before you commit a budget.
One workflow shipped end-to-end on Claude, with eval data. Not a demo.
Embedded squad shipping the next Claude workflow on your roadmap.
Three anonymized capability patterns drawn from real engagements. Named references shared under NDA once we know what you're building.
Inside legal team reviewing 80-page master agreements + amendments manually; 6 hours per contract average; clause deviations slipping through.
Claude Sonnet 4.6 ingests the full contract + amendment chain + the team's clause-deviation playbook in a single prompt. Returns a redline summary with citations to specific clause numbers.
Tier-1 support team drowning in repetitive product questions; help-center docs underused; agents copy-paste-editing the same replies.
Claude Sonnet RAG agent over product docs + historical ticket replies. Drafts reply if confidence > 0.7, escalates with a redacted draft otherwise. Learns from every agent edit.
Mid-size engineering team losing 4–8 hours per on-call rotation triaging stale alerts and tracing through a 150-file legacy service.
Claude Code agentic loop with custom subagents for repo navigation, log query, and PR drafting. Plugged into PagerDuty + the team's incident runbook.
Book a free Claude audit. We'll review your current LLM workload (if any), recommend Sonnet / Haiku / Opus per workflow, project token cost vs your current spend, and give you a 90-day Claude roadmap. No deck, no obligation to build.
Building with Claude often connects to OpenAI, agent frameworks, or full AI integration. These pages go deeper.
GPT-5, GPT-5-mini, Realtime API integration.
Multi-step autonomous agents with LangGraph, CrewAI.
Production chatbots on Claude + GPT, RAG, guardrails, multi-channel.
Plug Claude into Salesforce, Slack, NetSuite, and more.
Sonnet 4.6 for clause analysis + cite-grounded brief research, Bedrock + customer-managed keys for ring 2.
Claude-powered workflow automation. Where a Claude pilot lands inside a full delivery program.
Pre-build discovery audit when the workflow needs scoping before the engineering pilot.
Sonnet 4.6 for predictive-maintenance narrative + shift-handoff summaries.
Claude RAG over internal docs — the productized AI knowledge management service.
Model-agnostic umbrella — when the build spans Claude + GPT + open-weights, app shell, ML classifiers, and ongoing delivery. The parent AI development company pillar.
Claude-powered voice agents — Sonnet 4.6 for reasoning, chained with Whisper or Deepgram STT and ElevenLabs or Cartesia TTS, deployed to Twilio or Vapi. The voice-specific sibling pillar.
Token-cost projection for Claude Opus, Sonnet, and Haiku against the workflow shapes we audit most. Useful before scoping a Claude-only engagement.
Dated retrieval benchmark — Claude Sonnet 4.6 and Haiku 4.5 vs GPT, Gemini, and open-source on a 1,840-document corpus.
Decision support for picking RAG, fine-tuning, hybrid, or prompt engineering — Claude-specific tradeoffs surfaced where relevant.