Claude on Amazon Bedrock: Everything You Need to Know

Claude on Amazon Bedrock: The Complete Guide | ToolTechSavvy

From your first API call to enterprise-grade agentic workflows — a practical, no-fluff guide to running Anthropic’s Claude models inside the AWS ecosystem.

If you’ve been following the AI tooling space, you already know that Anthropic’s Claude is one of the most capable large language models available today. What you might not know is exactly how to deploy and use Claude inside AWS without standing up your own infrastructure, managing API keys across services, or negotiating enterprise deals with Anthropic directly.

That’s precisely what Amazon Bedrock solves. It bridges the gap between Claude’s raw capabilities and the AWS services your team likely already depends on — S3, Lambda, IAM, CloudWatch, VPCs — letting you embed AI into production systems with the governance, compliance, and security controls you already understand.

This guide walks through every major capability: the models available, how to make your first API call, building agentic workflows, wiring up a RAG pipeline with S3, locking things down with enterprise security, filtering outputs with Guardrails, and running cost-efficient bulk inference jobs.

Who this is for

Developers, ML engineers, and architects who want to integrate Claude into AWS-based applications. Some familiarity with AWS concepts (IAM, S3, VPCs) is assumed — no prior AI/ML experience needed beyond basic prompt engineering.


01 —

What is Amazon Bedrock?

Amazon Bedrock is AWS’s fully managed service for accessing, fine-tuning, and deploying foundation models from multiple AI providers — including Anthropic, AI21 Labs, Cohere, Meta, Mistral, and Stability AI — through a single, unified API surface. Think of it as a model marketplace layered on top of AWS infrastructure.

The key distinction from calling a provider’s API directly is that Bedrock is a native AWS service. This means your inference traffic never leaves the AWS network, model invocations are logged in CloudWatch, access is controlled through IAM policies, and you can attach the same security and compliance tooling you use for every other AWS service.

How Bedrock fits into the AWS ecosystem

Bedrock is deeply integrated with the AWS fabric — it’s not an isolated service. A few examples of how that plays out in practice:

Storage

Amazon S3

Store documents for Knowledge Bases, save batch inference outputs, and pull fine-tuning datasets directly from buckets.

Compute

AWS Lambda

Invoke Bedrock models from serverless functions — no servers to provision, no idle capacity to pay for.

Identity

AWS IAM

Granular permission policies control exactly which models each role, user, or service can invoke.

Observability

CloudWatch

Automatic logging of invocation metrics, latency, token consumption, and error rates without extra setup.

Workflows

Step Functions

Orchestrate multi-step AI workflows with built-in retry logic, branching, and state management.

Networking

VPC / PrivateLink

Keep all Bedrock traffic private inside your VPC — never touching the public internet.

Key advantage

Because Bedrock uses the same IAM auth model as every other AWS service, adding Claude to an existing application often requires zero new credentials or secret management — just an IAM role with the right policy attached.


02 —

Claude Models on Bedrock

Anthropic’s model lineup is tiered by capability, context window, and price. Bedrock exposes most of the current Claude family, though availability varies by AWS region. Here’s a current snapshot:

ModelBest forContextStatus
claude-opus-4-5 Complex reasoning, long documents 200K tokens GA
claude-sonnet-4-5 Balanced performance / cost 200K tokens GA
claude-haiku-4-5 High-volume, low-latency tasks 200K tokens GA
Claude 3.5 Sonnet Legacy workloads, code generation 200K tokens Legacy
Claude 3 Haiku High-throughput classification 200K tokens Deprecating

Regional availability

RegionCodeOpusSonnetHaiku
US East (N. Virginia) us-east-1
US West (Oregon) us-west-2
Europe (Frankfurt) eu-central-1 Limited
Europe (Ireland) eu-west-1 Limited
Asia Pacific (Tokyo) ap-northeast-1
Asia Pacific (Singapore) ap-southeast-1
Check before you build

Model availability changes frequently as AWS expands Bedrock’s regional footprint. Always verify current availability in the AWS Bedrock console under Model access before designing a region strategy for production.


03 —

Accessing Claude via AWS SDK & Boto3

Bedrock exposes Claude through a standard API you call using the AWS SDK. For Python, that’s Boto3. The client is bedrock-runtime, and the primary methods are invoke_model for synchronous calls or invoke_model_with_response_stream for streaming.

Prerequisites

  • Enable the Claude model in your AWS account via the Bedrock console under Model access
  • Ensure your IAM role has the bedrock:InvokeModel permission
  • Install Boto3 with pip install boto3

Basic invocation

Python · Boto3 · basic inferenceimport boto3
import json

client = boto3.client(
    service_name="bedrock-runtime",
    region_name="us-east-1"
)

body = json.dumps({
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1024,
    "messages": [
        {
            "role": "user",
            "content": "Summarise the key benefits of Amazon Bedrock in three bullet points."
        }
    ]
})

response = client.invoke_model(
    modelId="anthropic.claude-sonnet-4-5-20251001",
    body=body
)

result = json.loads(response["body"].read())
print(result["content"][0]["text"])

Streaming responses

For user-facing applications where latency matters, use streaming to pipe tokens back as they arrive:

Python · Boto3 · streamingresponse = client.invoke_model_with_response_stream(
    modelId="anthropic.claude-sonnet-4-5-20251001",
    body=body
)

for event in response["body"]:
    chunk = json.loads(event["chunk"]["bytes"])
    if chunk.get("type") == "content_block_delta":
        print(chunk["delta"]["text"], end="", flush=True)
System prompts

Claude on Bedrock fully supports the system parameter — add "system": "You are a helpful assistant..." at the top level of your JSON payload alongside messages. This is where you define persona, constraints, and tone.


04 —

Bedrock Agents

Basic inference only takes you so far. Real-world applications often need Claude to take actions: query a database, call an internal API, look up live pricing, or update a CRM record. Bedrock Agents is AWS’s framework for building these multi-step, tool-using workflows on top of Claude.

Bedrock Agents turns Claude from a text generator into an autonomous actor — one that can reason about which tools to call, interpret the results, and decide what to do next.

How Bedrock Agents work

01

Foundation model

The underlying model powering the agent’s reasoning. Claude Sonnet for cost/capability balance; Opus for complex multi-step tasks.

02

Action groups (tools)

Lambda functions defined by an OpenAPI schema. The agent reads the schema to understand what each action does, then decides when and how to call it. Multiple action groups can be attached to one agent.

03

Knowledge Base (optional)

A RAG data source the agent can query mid-task for factual grounding. Covered in depth in the next section.

04

Orchestration loop

Bedrock handles the ReAct-style loop automatically: Claude reasons → picks a tool → Bedrock calls the Lambda → result fed back → Claude reasons again → until complete or a turn limit is reached.

Python · invoking a Bedrock Agentagent_client = boto3.client("bedrock-agent-runtime", region_name="us-east-1")

response = agent_client.invoke_agent(
    agentId="ABCD1234",
    agentAliasId="TSTALIASID",
    sessionId="session-001",
    inputText="What's the status of order #78432, and can I get a refund?"
)

for event in response["completion"]:
    if "chunk" in event:
        print(event["chunk"]["bytes"].decode(), end="")
Agent aliases

The TSTALIASID alias always points to the working draft — useful for development. For production, create a named alias pinned to a specific agent version, giving you controlled rollbacks without touching application code.


05 —

Knowledge Bases — Native RAG with S3

Retrieval Augmented Generation (RAG) grounds Claude’s responses in your own documents rather than relying solely on its training data. Bedrock Knowledge Bases is AWS’s fully managed RAG implementation — connect it to an S3 bucket, and Bedrock handles chunking, embedding, indexing, and retrieval automatically.

01

Data source (S3)

Point a Knowledge Base at one or more S3 prefixes containing your documents — PDFs, Word files, HTML, plain text, CSV. Bedrock crawls and processes them on ingestion.

02

Chunking & embedding

Bedrock splits documents into chunks (configurable size and overlap), then generates vector embeddings using Amazon Titan Embeddings or Cohere Embed.

03

Vector store

Embeddings are stored in Amazon OpenSearch Serverless, Aurora PostgreSQL with pgvector, Redis Enterprise Cloud, or Pinecone — your choice.

04

Retrieval at inference time

When a query arrives, Bedrock embeds it, finds the most semantically similar chunks, and injects them into Claude’s context window as grounding material.

Python · Knowledge Base retrieve-and-generatekb_client = boto3.client("bedrock-agent-runtime", region_name="us-east-1")

response = kb_client.retrieve_and_generate(
    input={"text": "What is our refund policy for digital products?"},
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            "knowledgeBaseId": "KBID1234ABCD",
            "modelArn": "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-sonnet-4-5-20251001"
        }
    }
)

print(response["output"]["text"])
# Citations are in response["citations"] — maps text back to source chunks
Built-in citations

The citations array in the response tells you exactly which S3 document and chunk each part of Claude’s answer came from — critical for compliance use cases and debugging hallucinations.


06 —

Security & Compliance

For many organisations — especially in healthcare, finance, and government — using a model like Claude isn’t just a performance decision. It’s a compliance decision. Bedrock is designed for exactly this context.

VPC Endpoints (PrivateLink)

Create an interface VPC endpoint for Bedrock so all inference traffic routes through your private VPC — no public internet exposure. Endpoint policies restrict which models and operations the endpoint serves.

Data not used for training

AWS does not use your prompts or completions to train foundation models by default. Your data stays in your AWS account and is not shared with Anthropic or other model providers.

HIPAA eligibility

Bedrock is included in the AWS HIPAA BAA, making it eligible for workloads involving Protected Health Information (PHI). Verify the current eligible services list before building.

Encryption at rest & in transit

All data is encrypted in transit using TLS 1.2+. At rest, use AWS-managed keys or your own KMS Customer Managed Keys (CMK) for Knowledge Bases and fine-tuning data.

IAM permission model

Fine-grained IAM policies control access at the model level. Restrict a Lambda to Haiku only, prevent roles from accessing Opus, or require MFA for model access changes.

AWS CloudTrail

Every model invocation, Knowledge Base query, and Agent action is logged in CloudTrail — a full audit trail of who called what model, with what parameters, and when.

Model invocation logging

Bedrock supports model invocation logging — an opt-in feature that saves full request and response payloads to S3 or CloudWatch Logs. Invaluable for debugging, auditing, and post-hoc analysis. Enable it per region in the Bedrock console.

Amazon Bedrock is in scope for SOC 1/2/3, ISO 27001/27017/27018, PCI DSS, FedRAMP Moderate, and GDPR data processing agreements. Always verify the current in-scope services on the AWS Compliance Programs page before making architecture decisions.


07 —

Bedrock Guardrails

Even with a safety-focused model like Claude, enterprise deployments often need additional, configurable safeguards. Bedrock Guardrails provides a layer of content policies you configure independently of the model — apply the same guardrail across multiple models and update it without redeploying your application.

Content filters

Harmful content

Block or flag inputs and outputs containing hate speech, violence, sexual content, or dangerous activity instructions. Configurable sensitivity levels per category.

PII detection

Sensitive data

Automatically detect and redact PII — names, emails, SSNs, credit card numbers — from both inputs and outputs.

Topic control

Denied topics

Define custom topic policies in plain English — “do not discuss competitor products” — and Bedrock enforces them automatically.

Grounding

Hallucination detection

Contextual grounding checks compare Claude’s output against source material and flag or block responses that aren’t well-supported.

Python · invoke model with guardrailresponse = client.invoke_model(
    modelId="anthropic.claude-sonnet-4-5-20251001",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1024,
        "messages": [{"role": "user", "content": user_input}]
    }),
    guardrailIdentifier="gr-abc123",    # your guardrail ID
    guardrailVersion="DRAFT",             # or a version number
    trace="ENABLED"                      # see which policies triggered
)

result = json.loads(response["body"].read())
# result["amazon-bedrock-guardrailResult"] contains policy trace details
Model-independent

Because Guardrails are a separate Bedrock resource attached at invocation time, a single guardrail policy applies consistently across Claude Sonnet, Haiku, and any other Bedrock model — without changing application code when you swap models.


08 —

Batch Inference

Real-time inference is right for interactive applications. But for workloads like processing thousands of support tickets overnight, classifying a product catalogue, or running evaluations across a test set, synchronous calls are expensive and unnecessary. This is what Bedrock Batch Inference is built for.

Batch inference is typically up to 50% cheaper per token than on-demand pricing, and it removes the need to manage rate limits, retries, and concurrency logic in your own code.

01

Prepare input in S3

Create a JSONL file where each line is a complete Bedrock API request body. Upload it to an S3 bucket your Bedrock role can read from.

02

Create a batch job

Submit a CreateModelInvocationJob API call specifying the model, the S3 input path, and an S3 output path. The job queues immediately.

03

Bedrock processes asynchronously

AWS processes the batch in the background. Poll job status or configure SNS / EventBridge notifications for completion.

04

Retrieve outputs from S3

Completed responses appear as a JSONL file in your output bucket. Each line corresponds to an input record, with an added modelOutput field.

Python · create batch inference jobbedrock = boto3.client("bedrock", region_name="us-east-1")

job = bedrock.create_model_invocation_job(
    jobName="product-classification-march",
    modelId="anthropic.claude-haiku-4-5-20251001",
    roleArn="arn:aws:iam::123456789:role/BedrockBatchRole",
    inputDataConfig={
        "s3InputDataConfig": {
            "s3Uri": "s3://my-bucket/batch-input/requests.jsonl"
        }
    },
    outputDataConfig={
        "s3OutputDataConfig": {
            "s3Uri": "s3://my-bucket/batch-output/"
        }
    }
)
print(job["jobArn"])
Model choice for batch

For most batch workloads, Claude Haiku is the right default. It’s the fastest and cheapest model in the family, and at batch pricing the economics are compelling for high-volume classification, extraction, and summarisation tasks.


— Wrapping up

Putting it all together

Amazon Bedrock isn’t just a thin wrapper around Anthropic’s API — it’s a full platform for building production AI systems with the governance, security, and ecosystem integrations that enterprise AWS users expect. Claude sits at the centre of that platform as the highest-capability model family available on it.

If you’re just getting started, the path of least resistance is a simple Boto3 integration for a specific internal use case — a document summariser, a support ticket classifier, a code review assistant. From there, you can graduate to Knowledge Bases when you need factual grounding, Agents when you need tool use, Guardrails when you need policy enforcement, and Batch when the volume justifies it.

The components are composable

  • An Agent backed by Claude Sonnet, augmented by a Knowledge Base, protected by a Guardrail
  • Invocations logged via CloudTrail, all running inside a VPC with a PrivateLink endpoint
  • Batch jobs handling overnight classification at half the cost of on-demand calls
  • Each piece independently useful — but designed from the ground up to work together

Continue exploring

More on ToolTechSavvy

Practical deep dives on AI tools, cloud platforms, and developer workflows — written for people who build things.

Visit ToolTechSavvy tooltechsavvy.com

Leave a Comment

Your email address will not be published. Required fields are marked *