If you’ve been following the AI tooling space, you already know that Anthropic’s Claude is one of the most capable large language models available today. What you might not know is exactly how to deploy and use Claude inside AWS without standing up your own infrastructure, managing API keys across services, or negotiating enterprise deals with Anthropic directly.
That’s precisely what Amazon Bedrock solves. It bridges the gap between Claude’s raw capabilities and the AWS services your team likely already depends on — S3, Lambda, IAM, CloudWatch, VPCs — letting you embed AI into production systems with the governance, compliance, and security controls you already understand.
This guide walks through every major capability: the models available, how to make your first API call, building agentic workflows, wiring up a RAG pipeline with S3, locking things down with enterprise security, filtering outputs with Guardrails, and running cost-efficient bulk inference jobs.
This is written for developers, ML engineers, and architects who want to integrate Claude into AWS-based applications. Some familiarity with AWS concepts (IAM, S3, VPCs) is assumed, but you don’t need any prior AI/ML experience beyond basic prompt engineering.
What is Amazon Bedrock?
The managed foundation model platformAmazon Bedrock is AWS’s fully managed service for accessing, fine-tuning, and deploying foundation models from multiple AI providers — including Anthropic, AI21 Labs, Cohere, Meta, Mistral, and Stability AI — through a single, unified API surface. Think of it as a model marketplace layered on top of AWS infrastructure.
The key distinction from calling a provider’s API directly is that Bedrock is a native AWS service. This means your inference traffic never leaves the AWS network, model invocations are logged in CloudWatch, access is controlled through IAM policies, and you can attach the same security and compliance tooling you use for every other AWS service.
How Bedrock fits into the AWS ecosystem
Bedrock is not an isolated service — it’s deeply integrated with the AWS fabric. A few examples of how these integrations play out in practice:
Amazon S3
Store documents for Knowledge Bases, save batch inference outputs, and pull fine-tuning datasets directly from buckets.
AWS Lambda
Invoke Bedrock models from serverless functions — no servers to provision, no idle capacity to pay for.
AWS IAM
Granular permission policies control exactly which models each role, user, or service can invoke.
CloudWatch
Automatic logging of invocation metrics, latency, token consumption, and error rates without any extra setup.
Step Functions
Orchestrate multi-step AI workflows with built-in retry logic, branching, and state management.
VPC / PrivateLink
Keep all Bedrock traffic private inside your VPC — never touching the public internet.
Because Bedrock uses the same IAM auth model as every other AWS service, adding Claude to an existing application often requires zero new credentials or secret management — just an IAM role with the right policy attached.
Claude Models on Bedrock
Which models are available and whereAnthropic’s model lineup is tiered by capability, context window, and price. Bedrock exposes most of the current Claude family, though model availability varies by AWS region. Here’s a current snapshot:
| Model | Best for | Context | Status |
|---|---|---|---|
claude-opus-4-5 |
Complex reasoning, long documents | 200K tokens | GA |
claude-sonnet-4-5 |
Balanced performance / cost | 200K tokens | GA |
claude-haiku-4-5 |
High-volume, low-latency tasks | 200K tokens | GA |
| Claude 3.5 Sonnet | Legacy workloads, code gen | 200K tokens | Legacy |
| Claude 3 Haiku | High-throughput classification | 200K tokens | Deprecating |
Regional availability
Bedrock is available across multiple AWS regions, and Claude’s availability maps closely to the major commercial regions. At time of writing, Claude is generally available in:
| Region | Code | Opus | Sonnet | Haiku |
|---|---|---|---|---|
| US East (N. Virginia) | us-east-1 |
✓ | ✓ | ✓ |
| US West (Oregon) | us-west-2 |
✓ | ✓ | ✓ |
| Europe (Frankfurt) | eu-central-1 |
Limited | ✓ | ✓ |
| Europe (Ireland) | eu-west-1 |
Limited | ✓ | ✓ |
| Asia Pacific (Tokyo) | ap-northeast-1 |
— | ✓ | ✓ |
| Asia Pacific (Singapore) | ap-southeast-1 |
— | ✓ | ✓ |
Model availability changes frequently as AWS expands Bedrock’s regional footprint. Always verify current availability in the AWS Bedrock console under Model access before designing a region strategy for production.
Accessing Claude via AWS SDK & Boto3
Your first API call in five minutesBedrock exposes Claude through a standard API that you call using the AWS SDK. For Python, that’s Boto3. The client you’ll use is bedrock-runtime, and the primary method is invoke_model for synchronous calls or invoke_model_with_response_stream for streaming.
Prerequisites
Before your first call: (1) enable the Claude model in your AWS account via the Bedrock console under Model access, (2) ensure your IAM role has the bedrock:InvokeModel permission, and (3) install Boto3 with pip install boto3.
Basic invocation
Python · Boto3 · basic inferenceimport boto3
import json
client = boto3.client(
service_name="bedrock-runtime",
region_name="us-east-1"
)
body = json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": "Summarise the key benefits of Amazon Bedrock in three bullet points."
}
]
})
response = client.invoke_model(
modelId="anthropic.claude-sonnet-4-5-20251001",
body=body
)
result = json.loads(response["body"].read())
print(result["content"][0]["text"])
Streaming responses
For user-facing applications where latency matters, use streaming to pipe tokens back as they arrive rather than waiting for the complete response:
Python · Boto3 · streamingresponse = client.invoke_model_with_response_stream(
modelId="anthropic.claude-sonnet-4-5-20251001",
body=body
)
for event in response["body"]:
chunk = json.loads(event["chunk"]["bytes"])
if chunk.get("type") == "content_block_delta":
print(chunk["delta"]["text"], end="", flush=True)
Claude on Bedrock fully supports the system parameter in the request body — just add "system": "You are a helpful assistant..." at the top level of your JSON payload alongside messages. This is where you define the persona, constraints, and tone for your application.
Bedrock Agents
Building agentic workflows with ClaudeBasic inference — send prompt, receive response — only takes you so far. Real-world applications often need Claude to take actions: query a database, call an internal API, look up live pricing, or update a CRM record. Bedrock Agents is AWS’s framework for building these multi-step, tool-using workflows on top of Claude.
How Bedrock Agents work
An Agent consists of three core components that you configure once and then invoke repeatedly:
Foundation model
The underlying model powering the agent’s reasoning. You choose a Claude model — typically Claude Sonnet for the cost/capability balance, or Opus for complex multi-step tasks.
Action groups (tools)
Lambda functions defined by an OpenAPI schema. The agent reads the schema to understand what each action does, then decides when and how to call it. You can attach multiple action groups to one agent.
Knowledge Base (optional)
A RAG data source the agent can query mid-task for factual grounding. Covered in depth in the next section.
Orchestration loop
Bedrock handles the ReAct-style loop automatically: Claude reasons → picks a tool → Bedrock calls the Lambda → result is fed back → Claude reasons again → until the task is complete or a turn limit is reached.
A practical example: customer support agent
Imagine a customer support bot that needs to look up order status, check inventory, and initiate refunds. You’d define three action groups, each backed by a Lambda that calls your internal systems. The agent handles the natural language understanding, decides which actions to call and in what order, and produces a coherent final response — all without you writing orchestration logic.
Python · invoking a Bedrock Agentagent_client = boto3.client("bedrock-agent-runtime", region_name="us-east-1")
response = agent_client.invoke_agent(
agentId="ABCD1234",
agentAliasId="TSTALIASID",
sessionId="session-001",
inputText="What's the status of order #78432, and can I get a refund?"
)
for event in response["completion"]:
if "chunk" in event:
print(event["chunk"]["bytes"].decode(), end="")
Agents use an alias system for deployment. The TSTALIASID alias always points to the working draft — useful for development. For production, create a named alias pinned to a specific agent version, giving you controlled rollbacks.
Knowledge Bases
Native RAG with S3 — no vector DB to manageRetrieval Augmented Generation (RAG) is the technique of grounding Claude’s responses in your own documents rather than relying solely on its training data. Bedrock Knowledge Bases is AWS’s fully managed RAG implementation — you connect it to an S3 bucket, and Bedrock handles chunking, embedding, indexing, and retrieval automatically.
How a Knowledge Base is built
Data source (S3)
Point a Knowledge Base at one or more S3 prefixes containing your documents — PDFs, Word files, HTML, plain text, CSV. Bedrock crawls and processes them on ingestion.
Chunking & embedding
Bedrock splits documents into chunks (configurable size and overlap), then generates vector embeddings using your choice of embedding model — Amazon Titan Embeddings or Cohere Embed.
Vector store
Embeddings are stored in a vector database. Bedrock supports Amazon OpenSearch Serverless, Aurora PostgreSQL with pgvector, Redis Enterprise Cloud, and Pinecone.
Retrieval at inference time
When a user query arrives, Bedrock embeds it, finds the most semantically similar chunks, and injects them into Claude’s context window as grounding material.
Querying a Knowledge Base directly
Python · Knowledge Base retrieve-and-generatekb_client = boto3.client("bedrock-agent-runtime", region_name="us-east-1")
response = kb_client.retrieve_and_generate(
input={"text": "What is our refund policy for digital products?"},
retrieveAndGenerateConfiguration={
"type": "KNOWLEDGE_BASE",
"knowledgeBaseConfiguration": {
"knowledgeBaseId": "KBID1234ABCD",
"modelArn": "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-sonnet-4-5-20251001"
}
}
)
print(response["output"]["text"])
# Citations are in response["citations"] — each one maps text back to the source chunk
One of the most useful features of Bedrock Knowledge Bases is the automatic citation system. The citations array in the response tells you exactly which S3 document and which chunk each part of Claude’s answer came from — critical for compliance use cases and for debugging hallucinations.
Security & Compliance
PrivateLink, VPC isolation & HIPAA eligibilityFor many organisations — especially in healthcare, finance, and government — the ability to use a model like Claude isn’t just a performance decision. It’s a compliance decision. Bedrock is designed for exactly this context, with multiple layers of isolation and a set of compliance certifications that cover most enterprise requirements.
VPC Endpoints (PrivateLink)
Create an interface VPC endpoint for Bedrock so all inference traffic routes through your private VPC — no public internet exposure. Endpoint policies let you restrict which models and operations the endpoint serves.
Data not used for training
By default, AWS does not use your prompts or completions to train foundation models. Your data stays in your AWS account and is not shared with Anthropic or other model providers.
HIPAA eligibility
Bedrock is included in the AWS HIPAA BAA, making it eligible for workloads that involve Protected Health Information (PHI). Verify the current list of HIPAA-eligible services before building.
Encryption at rest & in transit
All data is encrypted in transit using TLS 1.2+. At rest, you can use AWS-managed keys or your own KMS Customer Managed Keys (CMK) for Knowledge Bases and fine-tuning data.
IAM permission model
Fine-grained IAM policies control access at the model level. You can restrict a Lambda to Haiku only, prevent specific roles from accessing Opus, or require MFA for model access changes.
AWS CloudTrail
Every model invocation, Knowledge Base query, and Agent action is logged in CloudTrail. This gives you a full audit trail — who called what model, with what parameters, and when.
Compliance certifications
Amazon Bedrock is in scope for a number of AWS compliance programmes including SOC 1/2/3, ISO 27001/27017/27018, PCI DSS, FedRAMP Moderate, and GDPR data processing agreements. The specific list changes — always verify the current in-scope services on the AWS Compliance Programs page before making architecture decisions.
Bedrock also supports model invocation logging — an opt-in feature that saves full request and response payloads to an S3 bucket or CloudWatch Logs. Invaluable for debugging, auditing, and post-hoc analysis. Enable it per region in the Bedrock console settings.
Bedrock Guardrails
Content filtering and output safety controlsEven with a safety-focused model like Claude, enterprise deployments often need additional, configurable safeguards. Bedrock Guardrails provides a layer of content policies you configure independently of the model — meaning you can apply the same guardrail across multiple models and update it without redeploying your application.
What Guardrails can do
Harmful content
Block or flag inputs and outputs that contain hate speech, violence, sexual content, or instructions for dangerous activities. Configurable sensitivity levels per category.
Sensitive data
Automatically detect and redact PII — names, emails, phone numbers, SSNs, credit card numbers — from both inputs (to prevent prompt injection) and outputs (to prevent leakage).
Denied topics
Define custom topic policies in plain English — “do not discuss competitor products”, “refuse requests about investment advice” — and Bedrock enforces them automatically.
Hallucination detection
Contextual grounding checks compare Claude’s output against the source material (from a Knowledge Base or inline context) and flag or block responses that aren’t well-supported.
Applying a Guardrail to an invocation
Python · invoke model with guardrailresponse = client.invoke_model(
modelId="anthropic.claude-sonnet-4-5-20251001",
body=json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"messages": [{"role": "user", "content": user_input}]
}),
guardrailIdentifier="gr-abc123", # your guardrail ID
guardrailVersion="DRAFT", # or a version number
trace="ENABLED" # see which policies triggered
)
result = json.loads(response["body"].read())
# result["amazon-bedrock-guardrailResult"] contains policy trace details
Because Guardrails are a separate Bedrock resource attached at invocation time, you can build a single guardrail policy and apply it consistently across Claude Sonnet, Haiku, and any other Bedrock model you use — without changing application code when you swap models.
Batch Inference
Cost-efficient processing for high-volume workloadsReal-time inference — calling Claude one prompt at a time — is the right approach for interactive applications. But for workloads like processing thousands of support tickets overnight, classifying a product catalogue, or running evaluations across a test set, synchronous calls are expensive and unnecessary. This is what Bedrock Batch Inference is built for.
Batch inference is typically up to 50% cheaper per token than on-demand pricing, and it removes the need to manage rate limits, retries, and concurrency logic in your own code — Bedrock handles all of that.
How batch jobs work
Prepare input in S3
Create a JSONL file where each line is a complete Bedrock API request body. Upload it to an S3 bucket your Bedrock role can read from.
Create a batch job
Submit a CreateModelInvocationJob API call specifying the model, the S3 input path, and an S3 output path. The job queues immediately.
Bedrock processes asynchronously
AWS processes the batch in the background. You can poll job status or configure SNS / EventBridge notifications for completion.
Retrieve outputs from S3
Completed responses appear as a JSONL file in your output bucket. Each line corresponds to an input record, with an added modelOutput field.
Submitting a batch job
Python · create batch inference jobbedrock = boto3.client("bedrock", region_name="us-east-1")
job = bedrock.create_model_invocation_job(
jobName="product-classification-march",
modelId="anthropic.claude-haiku-4-5-20251001",
roleArn="arn:aws:iam::123456789:role/BedrockBatchRole",
inputDataConfig={
"s3InputDataConfig": {
"s3Uri": "s3://my-bucket/batch-input/requests.jsonl"
}
},
outputDataConfig={
"s3OutputDataConfig": {
"s3Uri": "s3://my-bucket/batch-output/"
}
}
)
print(job["jobArn"])
For most batch workloads, Claude Haiku is the right default. It’s the fastest and cheapest model in the family, and at batch pricing the economics are compelling for high-volume classification, extraction, and summarisation tasks.
Putting it all together
What to build nextAmazon Bedrock isn’t just a thin wrapper around Anthropic’s API — it’s a full platform for building production AI systems with the governance, security, and ecosystem integrations that enterprise AWS users expect. Claude sits at the centre of that platform as the highest-capability model family available on it.
If you’re just getting started, the path of least resistance is a simple Boto3 integration for a specific internal use case — a document summariser, a support ticket classifier, a code review assistant. From there, you can graduate to Knowledge Bases when you need factual grounding, Agents when you need tool use, Guardrails when you need policy enforcement, and Batch when the volume justifies it.
The components are composable. A production agentic workflow might use all of them simultaneously: an Agent backed by Claude Sonnet, augmented by a Knowledge Base, protected by a Guardrail, invocations logged via CloudTrail, all running inside a VPC with a PrivateLink endpoint. Each piece is independently useful, but they’re designed to work together.
More deep dives on ToolTechSavvy
Practical guides on AI tools, cloud platforms, and developer workflows — written for people who build things.
Visit ToolTechSavvy → tooltechsavvy.com


