From your first API call to enterprise-grade agentic workflows — a practical, no-fluff guide to running Anthropic’s Claude models inside the AWS ecosystem.
If you’ve been following the AI tooling space, you already know that Anthropic’s Claude is one of the most capable large language models available today. What you might not know is exactly how to deploy and use Claude inside AWS without standing up your own infrastructure, managing API keys across services, or negotiating enterprise deals with Anthropic directly.
That’s precisely what Amazon Bedrock solves. It bridges the gap between Claude’s raw capabilities and the AWS services your team likely already depends on — S3, Lambda, IAM, CloudWatch, VPCs — letting you embed AI into production systems with the governance, compliance, and security controls you already understand.
This guide walks through every major capability: the models available, how to make your first API call, building agentic workflows, wiring up a RAG pipeline with S3, locking things down with enterprise security, filtering outputs with Guardrails, and running cost-efficient bulk inference jobs.
Developers, ML engineers, and architects who want to integrate Claude into AWS-based applications. Some familiarity with AWS concepts (IAM, S3, VPCs) is assumed — no prior AI/ML experience needed beyond basic prompt engineering.
01 —
What is Amazon Bedrock?
Amazon Bedrock is AWS’s fully managed service for accessing, fine-tuning, and deploying foundation models from multiple AI providers — including Anthropic, AI21 Labs, Cohere, Meta, Mistral, and Stability AI — through a single, unified API surface. Think of it as a model marketplace layered on top of AWS infrastructure.
The key distinction from calling a provider’s API directly is that Bedrock is a native AWS service. This means your inference traffic never leaves the AWS network, model invocations are logged in CloudWatch, access is controlled through IAM policies, and you can attach the same security and compliance tooling you use for every other AWS service.
How Bedrock fits into the AWS ecosystem
Bedrock is deeply integrated with the AWS fabric — it’s not an isolated service. A few examples of how that plays out in practice:
Amazon S3
Store documents for Knowledge Bases, save batch inference outputs, and pull fine-tuning datasets directly from buckets.
AWS Lambda
Invoke Bedrock models from serverless functions — no servers to provision, no idle capacity to pay for.
AWS IAM
Granular permission policies control exactly which models each role, user, or service can invoke.
CloudWatch
Automatic logging of invocation metrics, latency, token consumption, and error rates without extra setup.
Step Functions
Orchestrate multi-step AI workflows with built-in retry logic, branching, and state management.
VPC / PrivateLink
Keep all Bedrock traffic private inside your VPC — never touching the public internet.
Because Bedrock uses the same IAM auth model as every other AWS service, adding Claude to an existing application often requires zero new credentials or secret management — just an IAM role with the right policy attached.
02 —
Claude Models on Bedrock
Anthropic’s model lineup is tiered by capability, context window, and price. Bedrock exposes most of the current Claude family, though availability varies by AWS region. Here’s a current snapshot:
| Model | Best for | Context | Status |
|---|---|---|---|
claude-opus-4-5 |
Complex reasoning, long documents | 200K tokens | GA |
claude-sonnet-4-5 |
Balanced performance / cost | 200K tokens | GA |
claude-haiku-4-5 |
High-volume, low-latency tasks | 200K tokens | GA |
| Claude 3.5 Sonnet | Legacy workloads, code generation | 200K tokens | Legacy |
| Claude 3 Haiku | High-throughput classification | 200K tokens | Deprecating |
Regional availability
| Region | Code | Opus | Sonnet | Haiku |
|---|---|---|---|---|
| US East (N. Virginia) | us-east-1 |
✓ | ✓ | ✓ |
| US West (Oregon) | us-west-2 |
✓ | ✓ | ✓ |
| Europe (Frankfurt) | eu-central-1 |
Limited | ✓ | ✓ |
| Europe (Ireland) | eu-west-1 |
Limited | ✓ | ✓ |
| Asia Pacific (Tokyo) | ap-northeast-1 |
— | ✓ | ✓ |
| Asia Pacific (Singapore) | ap-southeast-1 |
— | ✓ | ✓ |
Model availability changes frequently as AWS expands Bedrock’s regional footprint. Always verify current availability in the AWS Bedrock console under Model access before designing a region strategy for production.
03 —
Accessing Claude via AWS SDK & Boto3
Bedrock exposes Claude through a standard API you call using the AWS SDK. For Python, that’s Boto3. The client is bedrock-runtime, and the primary methods are invoke_model for synchronous calls or invoke_model_with_response_stream for streaming.
Prerequisites
- Enable the Claude model in your AWS account via the Bedrock console under Model access
- Ensure your IAM role has the
bedrock:InvokeModelpermission - Install Boto3 with
pip install boto3
Basic invocation
Python · Boto3 · basic inferenceimport boto3
import json
client = boto3.client(
service_name="bedrock-runtime",
region_name="us-east-1"
)
body = json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": "Summarise the key benefits of Amazon Bedrock in three bullet points."
}
]
})
response = client.invoke_model(
modelId="anthropic.claude-sonnet-4-5-20251001",
body=body
)
result = json.loads(response["body"].read())
print(result["content"][0]["text"])
Streaming responses
For user-facing applications where latency matters, use streaming to pipe tokens back as they arrive:
Python · Boto3 · streamingresponse = client.invoke_model_with_response_stream(
modelId="anthropic.claude-sonnet-4-5-20251001",
body=body
)
for event in response["body"]:
chunk = json.loads(event["chunk"]["bytes"])
if chunk.get("type") == "content_block_delta":
print(chunk["delta"]["text"], end="", flush=True)
Claude on Bedrock fully supports the system parameter — add "system": "You are a helpful assistant..." at the top level of your JSON payload alongside messages. This is where you define persona, constraints, and tone.
04 —
Bedrock Agents
Basic inference only takes you so far. Real-world applications often need Claude to take actions: query a database, call an internal API, look up live pricing, or update a CRM record. Bedrock Agents is AWS’s framework for building these multi-step, tool-using workflows on top of Claude.
Bedrock Agents turns Claude from a text generator into an autonomous actor — one that can reason about which tools to call, interpret the results, and decide what to do next.
How Bedrock Agents work
Foundation model
The underlying model powering the agent’s reasoning. Claude Sonnet for cost/capability balance; Opus for complex multi-step tasks.
Action groups (tools)
Lambda functions defined by an OpenAPI schema. The agent reads the schema to understand what each action does, then decides when and how to call it. Multiple action groups can be attached to one agent.
Knowledge Base (optional)
A RAG data source the agent can query mid-task for factual grounding. Covered in depth in the next section.
Orchestration loop
Bedrock handles the ReAct-style loop automatically: Claude reasons → picks a tool → Bedrock calls the Lambda → result fed back → Claude reasons again → until complete or a turn limit is reached.
Python · invoking a Bedrock Agentagent_client = boto3.client("bedrock-agent-runtime", region_name="us-east-1")
response = agent_client.invoke_agent(
agentId="ABCD1234",
agentAliasId="TSTALIASID",
sessionId="session-001",
inputText="What's the status of order #78432, and can I get a refund?"
)
for event in response["completion"]:
if "chunk" in event:
print(event["chunk"]["bytes"].decode(), end="")
The TSTALIASID alias always points to the working draft — useful for development. For production, create a named alias pinned to a specific agent version, giving you controlled rollbacks without touching application code.
05 —
Knowledge Bases — Native RAG with S3
Retrieval Augmented Generation (RAG) grounds Claude’s responses in your own documents rather than relying solely on its training data. Bedrock Knowledge Bases is AWS’s fully managed RAG implementation — connect it to an S3 bucket, and Bedrock handles chunking, embedding, indexing, and retrieval automatically.
Data source (S3)
Point a Knowledge Base at one or more S3 prefixes containing your documents — PDFs, Word files, HTML, plain text, CSV. Bedrock crawls and processes them on ingestion.
Chunking & embedding
Bedrock splits documents into chunks (configurable size and overlap), then generates vector embeddings using Amazon Titan Embeddings or Cohere Embed.
Vector store
Embeddings are stored in Amazon OpenSearch Serverless, Aurora PostgreSQL with pgvector, Redis Enterprise Cloud, or Pinecone — your choice.
Retrieval at inference time
When a query arrives, Bedrock embeds it, finds the most semantically similar chunks, and injects them into Claude’s context window as grounding material.
Python · Knowledge Base retrieve-and-generatekb_client = boto3.client("bedrock-agent-runtime", region_name="us-east-1")
response = kb_client.retrieve_and_generate(
input={"text": "What is our refund policy for digital products?"},
retrieveAndGenerateConfiguration={
"type": "KNOWLEDGE_BASE",
"knowledgeBaseConfiguration": {
"knowledgeBaseId": "KBID1234ABCD",
"modelArn": "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-sonnet-4-5-20251001"
}
}
)
print(response["output"]["text"])
# Citations are in response["citations"] — maps text back to source chunks
The citations array in the response tells you exactly which S3 document and chunk each part of Claude’s answer came from — critical for compliance use cases and debugging hallucinations.
06 —
Security & Compliance
For many organisations — especially in healthcare, finance, and government — using a model like Claude isn’t just a performance decision. It’s a compliance decision. Bedrock is designed for exactly this context.
VPC Endpoints (PrivateLink)
Create an interface VPC endpoint for Bedrock so all inference traffic routes through your private VPC — no public internet exposure. Endpoint policies restrict which models and operations the endpoint serves.
Data not used for training
AWS does not use your prompts or completions to train foundation models by default. Your data stays in your AWS account and is not shared with Anthropic or other model providers.
HIPAA eligibility
Bedrock is included in the AWS HIPAA BAA, making it eligible for workloads involving Protected Health Information (PHI). Verify the current eligible services list before building.
Encryption at rest & in transit
All data is encrypted in transit using TLS 1.2+. At rest, use AWS-managed keys or your own KMS Customer Managed Keys (CMK) for Knowledge Bases and fine-tuning data.
IAM permission model
Fine-grained IAM policies control access at the model level. Restrict a Lambda to Haiku only, prevent roles from accessing Opus, or require MFA for model access changes.
AWS CloudTrail
Every model invocation, Knowledge Base query, and Agent action is logged in CloudTrail — a full audit trail of who called what model, with what parameters, and when.
Bedrock supports model invocation logging — an opt-in feature that saves full request and response payloads to S3 or CloudWatch Logs. Invaluable for debugging, auditing, and post-hoc analysis. Enable it per region in the Bedrock console.
Amazon Bedrock is in scope for SOC 1/2/3, ISO 27001/27017/27018, PCI DSS, FedRAMP Moderate, and GDPR data processing agreements. Always verify the current in-scope services on the AWS Compliance Programs page before making architecture decisions.
07 —
Bedrock Guardrails
Even with a safety-focused model like Claude, enterprise deployments often need additional, configurable safeguards. Bedrock Guardrails provides a layer of content policies you configure independently of the model — apply the same guardrail across multiple models and update it without redeploying your application.
Harmful content
Block or flag inputs and outputs containing hate speech, violence, sexual content, or dangerous activity instructions. Configurable sensitivity levels per category.
Sensitive data
Automatically detect and redact PII — names, emails, SSNs, credit card numbers — from both inputs and outputs.
Denied topics
Define custom topic policies in plain English — “do not discuss competitor products” — and Bedrock enforces them automatically.
Hallucination detection
Contextual grounding checks compare Claude’s output against source material and flag or block responses that aren’t well-supported.
Python · invoke model with guardrailresponse = client.invoke_model(
modelId="anthropic.claude-sonnet-4-5-20251001",
body=json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"messages": [{"role": "user", "content": user_input}]
}),
guardrailIdentifier="gr-abc123", # your guardrail ID
guardrailVersion="DRAFT", # or a version number
trace="ENABLED" # see which policies triggered
)
result = json.loads(response["body"].read())
# result["amazon-bedrock-guardrailResult"] contains policy trace details
Because Guardrails are a separate Bedrock resource attached at invocation time, a single guardrail policy applies consistently across Claude Sonnet, Haiku, and any other Bedrock model — without changing application code when you swap models.
08 —
Batch Inference
Real-time inference is right for interactive applications. But for workloads like processing thousands of support tickets overnight, classifying a product catalogue, or running evaluations across a test set, synchronous calls are expensive and unnecessary. This is what Bedrock Batch Inference is built for.
Batch inference is typically up to 50% cheaper per token than on-demand pricing, and it removes the need to manage rate limits, retries, and concurrency logic in your own code.
Prepare input in S3
Create a JSONL file where each line is a complete Bedrock API request body. Upload it to an S3 bucket your Bedrock role can read from.
Create a batch job
Submit a CreateModelInvocationJob API call specifying the model, the S3 input path, and an S3 output path. The job queues immediately.
Bedrock processes asynchronously
AWS processes the batch in the background. Poll job status or configure SNS / EventBridge notifications for completion.
Retrieve outputs from S3
Completed responses appear as a JSONL file in your output bucket. Each line corresponds to an input record, with an added modelOutput field.
Python · create batch inference jobbedrock = boto3.client("bedrock", region_name="us-east-1")
job = bedrock.create_model_invocation_job(
jobName="product-classification-march",
modelId="anthropic.claude-haiku-4-5-20251001",
roleArn="arn:aws:iam::123456789:role/BedrockBatchRole",
inputDataConfig={
"s3InputDataConfig": {
"s3Uri": "s3://my-bucket/batch-input/requests.jsonl"
}
},
outputDataConfig={
"s3OutputDataConfig": {
"s3Uri": "s3://my-bucket/batch-output/"
}
}
)
print(job["jobArn"])
For most batch workloads, Claude Haiku is the right default. It’s the fastest and cheapest model in the family, and at batch pricing the economics are compelling for high-volume classification, extraction, and summarisation tasks.
— Wrapping up
Putting it all together
Amazon Bedrock isn’t just a thin wrapper around Anthropic’s API — it’s a full platform for building production AI systems with the governance, security, and ecosystem integrations that enterprise AWS users expect. Claude sits at the centre of that platform as the highest-capability model family available on it.
If you’re just getting started, the path of least resistance is a simple Boto3 integration for a specific internal use case — a document summariser, a support ticket classifier, a code review assistant. From there, you can graduate to Knowledge Bases when you need factual grounding, Agents when you need tool use, Guardrails when you need policy enforcement, and Batch when the volume justifies it.
The components are composable
- An Agent backed by Claude Sonnet, augmented by a Knowledge Base, protected by a Guardrail
- Invocations logged via CloudTrail, all running inside a VPC with a PrivateLink endpoint
- Batch jobs handling overnight classification at half the cost of on-demand calls
- Each piece independently useful — but designed from the ground up to work together
Continue exploring
More on ToolTechSavvy
Practical deep dives on AI tools, cloud platforms, and developer workflows — written for people who build things.
Visit ToolTechSavvy tooltechsavvy.com


