Claude on Amazon Bedrock: Everything You Need to Know

If you’ve been following the AI tooling space, you already know that Anthropic’s Claude is one of the most capable large language models available today. What you might not know is exactly how to deploy and use Claude inside AWS without standing up your own infrastructure, managing API keys across services, or negotiating enterprise deals with Anthropic directly.

That’s precisely what Amazon Bedrock solves. It bridges the gap between Claude’s raw capabilities and the AWS services your team likely already depends on — S3, Lambda, IAM, CloudWatch, VPCs — letting you embed AI into production systems with the governance, compliance, and security controls you already understand.

This guide walks through every major capability: the models available, how to make your first API call, building agentic workflows, wiring up a RAG pipeline with S3, locking things down with enterprise security, filtering outputs with Guardrails, and running cost-efficient bulk inference jobs.

Who this is for

This is written for developers, ML engineers, and architects who want to integrate Claude into AWS-based applications. Some familiarity with AWS concepts (IAM, S3, VPCs) is assumed, but you don’t need any prior AI/ML experience beyond basic prompt engineering.

01

What is Amazon Bedrock?

The managed foundation model platform

Amazon Bedrock is AWS’s fully managed service for accessing, fine-tuning, and deploying foundation models from multiple AI providers — including Anthropic, AI21 Labs, Cohere, Meta, Mistral, and Stability AI — through a single, unified API surface. Think of it as a model marketplace layered on top of AWS infrastructure.

The key distinction from calling a provider’s API directly is that Bedrock is a native AWS service. This means your inference traffic never leaves the AWS network, model invocations are logged in CloudWatch, access is controlled through IAM policies, and you can attach the same security and compliance tooling you use for every other AWS service.

How Bedrock fits into the AWS ecosystem

Bedrock is not an isolated service — it’s deeply integrated with the AWS fabric. A few examples of how these integrations play out in practice:

Storage

Amazon S3

Store documents for Knowledge Bases, save batch inference outputs, and pull fine-tuning datasets directly from buckets.

Compute

AWS Lambda

Invoke Bedrock models from serverless functions — no servers to provision, no idle capacity to pay for.

Identity

AWS IAM

Granular permission policies control exactly which models each role, user, or service can invoke.

Observability

CloudWatch

Automatic logging of invocation metrics, latency, token consumption, and error rates without any extra setup.

Workflows

Step Functions

Orchestrate multi-step AI workflows with built-in retry logic, branching, and state management.

Networking

VPC / PrivateLink

Keep all Bedrock traffic private inside your VPC — never touching the public internet.

Key advantage

Because Bedrock uses the same IAM auth model as every other AWS service, adding Claude to an existing application often requires zero new credentials or secret management — just an IAM role with the right policy attached.

02

Claude Models on Bedrock

Which models are available and where

Anthropic’s model lineup is tiered by capability, context window, and price. Bedrock exposes most of the current Claude family, though model availability varies by AWS region. Here’s a current snapshot:

Model Best for Context Status
claude-opus-4-5 Complex reasoning, long documents 200K tokens GA
claude-sonnet-4-5 Balanced performance / cost 200K tokens GA
claude-haiku-4-5 High-volume, low-latency tasks 200K tokens GA
Claude 3.5 Sonnet Legacy workloads, code gen 200K tokens Legacy
Claude 3 Haiku High-throughput classification 200K tokens Deprecating

Regional availability

Bedrock is available across multiple AWS regions, and Claude’s availability maps closely to the major commercial regions. At time of writing, Claude is generally available in:

Region Code Opus Sonnet Haiku
US East (N. Virginia) us-east-1
US West (Oregon) us-west-2
Europe (Frankfurt) eu-central-1 Limited
Europe (Ireland) eu-west-1 Limited
Asia Pacific (Tokyo) ap-northeast-1
Asia Pacific (Singapore) ap-southeast-1
Tip

Model availability changes frequently as AWS expands Bedrock’s regional footprint. Always verify current availability in the AWS Bedrock console under Model access before designing a region strategy for production.

03

Accessing Claude via AWS SDK & Boto3

Your first API call in five minutes

Bedrock exposes Claude through a standard API that you call using the AWS SDK. For Python, that’s Boto3. The client you’ll use is bedrock-runtime, and the primary method is invoke_model for synchronous calls or invoke_model_with_response_stream for streaming.

Prerequisites

Before your first call: (1) enable the Claude model in your AWS account via the Bedrock console under Model access, (2) ensure your IAM role has the bedrock:InvokeModel permission, and (3) install Boto3 with pip install boto3.

Basic invocation

Python · Boto3 · basic inferenceimport boto3
import json

client = boto3.client(
    service_name="bedrock-runtime",
    region_name="us-east-1"
)

body = json.dumps({
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1024,
    "messages": [
        {
            "role": "user",
            "content": "Summarise the key benefits of Amazon Bedrock in three bullet points."
        }
    ]
})

response = client.invoke_model(
    modelId="anthropic.claude-sonnet-4-5-20251001",
    body=body
)

result = json.loads(response["body"].read())
print(result["content"][0]["text"])

Streaming responses

For user-facing applications where latency matters, use streaming to pipe tokens back as they arrive rather than waiting for the complete response:

Python · Boto3 · streamingresponse = client.invoke_model_with_response_stream(
    modelId="anthropic.claude-sonnet-4-5-20251001",
    body=body
)

for event in response["body"]:
    chunk = json.loads(event["chunk"]["bytes"])
    if chunk.get("type") == "content_block_delta":
        print(chunk["delta"]["text"], end="", flush=True)
Using system prompts

Claude on Bedrock fully supports the system parameter in the request body — just add "system": "You are a helpful assistant..." at the top level of your JSON payload alongside messages. This is where you define the persona, constraints, and tone for your application.

04

Bedrock Agents

Building agentic workflows with Claude

Basic inference — send prompt, receive response — only takes you so far. Real-world applications often need Claude to take actions: query a database, call an internal API, look up live pricing, or update a CRM record. Bedrock Agents is AWS’s framework for building these multi-step, tool-using workflows on top of Claude.

Bedrock Agents turns Claude from a text generator into an autonomous actor — one that can reason about which tools to call, interpret the results, and decide what to do next.

How Bedrock Agents work

An Agent consists of three core components that you configure once and then invoke repeatedly:

01

Foundation model

The underlying model powering the agent’s reasoning. You choose a Claude model — typically Claude Sonnet for the cost/capability balance, or Opus for complex multi-step tasks.

02

Action groups (tools)

Lambda functions defined by an OpenAPI schema. The agent reads the schema to understand what each action does, then decides when and how to call it. You can attach multiple action groups to one agent.

03

Knowledge Base (optional)

A RAG data source the agent can query mid-task for factual grounding. Covered in depth in the next section.

04

Orchestration loop

Bedrock handles the ReAct-style loop automatically: Claude reasons → picks a tool → Bedrock calls the Lambda → result is fed back → Claude reasons again → until the task is complete or a turn limit is reached.

A practical example: customer support agent

Imagine a customer support bot that needs to look up order status, check inventory, and initiate refunds. You’d define three action groups, each backed by a Lambda that calls your internal systems. The agent handles the natural language understanding, decides which actions to call and in what order, and produces a coherent final response — all without you writing orchestration logic.

Python · invoking a Bedrock Agentagent_client = boto3.client("bedrock-agent-runtime", region_name="us-east-1")

response = agent_client.invoke_agent(
    agentId="ABCD1234",
    agentAliasId="TSTALIASID",
    sessionId="session-001",
    inputText="What's the status of order #78432, and can I get a refund?"
)

for event in response["completion"]:
    if "chunk" in event:
        print(event["chunk"]["bytes"].decode(), end="")
Agent aliases

Agents use an alias system for deployment. The TSTALIASID alias always points to the working draft — useful for development. For production, create a named alias pinned to a specific agent version, giving you controlled rollbacks.

05

Knowledge Bases

Native RAG with S3 — no vector DB to manage

Retrieval Augmented Generation (RAG) is the technique of grounding Claude’s responses in your own documents rather than relying solely on its training data. Bedrock Knowledge Bases is AWS’s fully managed RAG implementation — you connect it to an S3 bucket, and Bedrock handles chunking, embedding, indexing, and retrieval automatically.

How a Knowledge Base is built

01

Data source (S3)

Point a Knowledge Base at one or more S3 prefixes containing your documents — PDFs, Word files, HTML, plain text, CSV. Bedrock crawls and processes them on ingestion.

02

Chunking & embedding

Bedrock splits documents into chunks (configurable size and overlap), then generates vector embeddings using your choice of embedding model — Amazon Titan Embeddings or Cohere Embed.

03

Vector store

Embeddings are stored in a vector database. Bedrock supports Amazon OpenSearch Serverless, Aurora PostgreSQL with pgvector, Redis Enterprise Cloud, and Pinecone.

04

Retrieval at inference time

When a user query arrives, Bedrock embeds it, finds the most semantically similar chunks, and injects them into Claude’s context window as grounding material.

Querying a Knowledge Base directly

Python · Knowledge Base retrieve-and-generatekb_client = boto3.client("bedrock-agent-runtime", region_name="us-east-1")

response = kb_client.retrieve_and_generate(
    input={"text": "What is our refund policy for digital products?"},
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            "knowledgeBaseId": "KBID1234ABCD",
            "modelArn": "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-sonnet-4-5-20251001"
        }
    }
)

print(response["output"]["text"])
# Citations are in response["citations"] — each one maps text back to the source chunk
Built-in citations

One of the most useful features of Bedrock Knowledge Bases is the automatic citation system. The citations array in the response tells you exactly which S3 document and which chunk each part of Claude’s answer came from — critical for compliance use cases and for debugging hallucinations.

06

Security & Compliance

PrivateLink, VPC isolation & HIPAA eligibility

For many organisations — especially in healthcare, finance, and government — the ability to use a model like Claude isn’t just a performance decision. It’s a compliance decision. Bedrock is designed for exactly this context, with multiple layers of isolation and a set of compliance certifications that cover most enterprise requirements.

VPC Endpoints (PrivateLink)

Create an interface VPC endpoint for Bedrock so all inference traffic routes through your private VPC — no public internet exposure. Endpoint policies let you restrict which models and operations the endpoint serves.

Data not used for training

By default, AWS does not use your prompts or completions to train foundation models. Your data stays in your AWS account and is not shared with Anthropic or other model providers.

HIPAA eligibility

Bedrock is included in the AWS HIPAA BAA, making it eligible for workloads that involve Protected Health Information (PHI). Verify the current list of HIPAA-eligible services before building.

Encryption at rest & in transit

All data is encrypted in transit using TLS 1.2+. At rest, you can use AWS-managed keys or your own KMS Customer Managed Keys (CMK) for Knowledge Bases and fine-tuning data.

IAM permission model

Fine-grained IAM policies control access at the model level. You can restrict a Lambda to Haiku only, prevent specific roles from accessing Opus, or require MFA for model access changes.

AWS CloudTrail

Every model invocation, Knowledge Base query, and Agent action is logged in CloudTrail. This gives you a full audit trail — who called what model, with what parameters, and when.

Compliance certifications

Amazon Bedrock is in scope for a number of AWS compliance programmes including SOC 1/2/3, ISO 27001/27017/27018, PCI DSS, FedRAMP Moderate, and GDPR data processing agreements. The specific list changes — always verify the current in-scope services on the AWS Compliance Programs page before making architecture decisions.

Model invocation logging

Bedrock also supports model invocation logging — an opt-in feature that saves full request and response payloads to an S3 bucket or CloudWatch Logs. Invaluable for debugging, auditing, and post-hoc analysis. Enable it per region in the Bedrock console settings.

07

Bedrock Guardrails

Content filtering and output safety controls

Even with a safety-focused model like Claude, enterprise deployments often need additional, configurable safeguards. Bedrock Guardrails provides a layer of content policies you configure independently of the model — meaning you can apply the same guardrail across multiple models and update it without redeploying your application.

What Guardrails can do

Content filters

Harmful content

Block or flag inputs and outputs that contain hate speech, violence, sexual content, or instructions for dangerous activities. Configurable sensitivity levels per category.

PII detection

Sensitive data

Automatically detect and redact PII — names, emails, phone numbers, SSNs, credit card numbers — from both inputs (to prevent prompt injection) and outputs (to prevent leakage).

Topic control

Denied topics

Define custom topic policies in plain English — “do not discuss competitor products”, “refuse requests about investment advice” — and Bedrock enforces them automatically.

Grounding

Hallucination detection

Contextual grounding checks compare Claude’s output against the source material (from a Knowledge Base or inline context) and flag or block responses that aren’t well-supported.

Applying a Guardrail to an invocation

Python · invoke model with guardrailresponse = client.invoke_model(
    modelId="anthropic.claude-sonnet-4-5-20251001",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1024,
        "messages": [{"role": "user", "content": user_input}]
    }),
    guardrailIdentifier="gr-abc123",    # your guardrail ID
    guardrailVersion="DRAFT",             # or a version number
    trace="ENABLED"                      # see which policies triggered
)

result = json.loads(response["body"].read())
# result["amazon-bedrock-guardrailResult"] contains policy trace details
Independent of the model

Because Guardrails are a separate Bedrock resource attached at invocation time, you can build a single guardrail policy and apply it consistently across Claude Sonnet, Haiku, and any other Bedrock model you use — without changing application code when you swap models.

08

Batch Inference

Cost-efficient processing for high-volume workloads

Real-time inference — calling Claude one prompt at a time — is the right approach for interactive applications. But for workloads like processing thousands of support tickets overnight, classifying a product catalogue, or running evaluations across a test set, synchronous calls are expensive and unnecessary. This is what Bedrock Batch Inference is built for.

Batch inference is typically up to 50% cheaper per token than on-demand pricing, and it removes the need to manage rate limits, retries, and concurrency logic in your own code — Bedrock handles all of that.

How batch jobs work

01

Prepare input in S3

Create a JSONL file where each line is a complete Bedrock API request body. Upload it to an S3 bucket your Bedrock role can read from.

02

Create a batch job

Submit a CreateModelInvocationJob API call specifying the model, the S3 input path, and an S3 output path. The job queues immediately.

03

Bedrock processes asynchronously

AWS processes the batch in the background. You can poll job status or configure SNS / EventBridge notifications for completion.

04

Retrieve outputs from S3

Completed responses appear as a JSONL file in your output bucket. Each line corresponds to an input record, with an added modelOutput field.

Submitting a batch job

Python · create batch inference jobbedrock = boto3.client("bedrock", region_name="us-east-1")

job = bedrock.create_model_invocation_job(
    jobName="product-classification-march",
    modelId="anthropic.claude-haiku-4-5-20251001",
    roleArn="arn:aws:iam::123456789:role/BedrockBatchRole",
    inputDataConfig={
        "s3InputDataConfig": {
            "s3Uri": "s3://my-bucket/batch-input/requests.jsonl"
        }
    },
    outputDataConfig={
        "s3OutputDataConfig": {
            "s3Uri": "s3://my-bucket/batch-output/"
        }
    }
)
print(job["jobArn"])
Model choice for batch

For most batch workloads, Claude Haiku is the right default. It’s the fastest and cheapest model in the family, and at batch pricing the economics are compelling for high-volume classification, extraction, and summarisation tasks.

Putting it all together

What to build next

Amazon Bedrock isn’t just a thin wrapper around Anthropic’s API — it’s a full platform for building production AI systems with the governance, security, and ecosystem integrations that enterprise AWS users expect. Claude sits at the centre of that platform as the highest-capability model family available on it.

If you’re just getting started, the path of least resistance is a simple Boto3 integration for a specific internal use case — a document summariser, a support ticket classifier, a code review assistant. From there, you can graduate to Knowledge Bases when you need factual grounding, Agents when you need tool use, Guardrails when you need policy enforcement, and Batch when the volume justifies it.

The components are composable. A production agentic workflow might use all of them simultaneously: an Agent backed by Claude Sonnet, augmented by a Knowledge Base, protected by a Guardrail, invocations logged via CloudTrail, all running inside a VPC with a PrivateLink endpoint. Each piece is independently useful, but they’re designed to work together.

Continue exploring

More deep dives on ToolTechSavvy

Practical guides on AI tools, cloud platforms, and developer workflows — written for people who build things.

Visit ToolTechSavvy → tooltechsavvy.com

Leave a Comment

Your email address will not be published. Required fields are marked *