Bedrock

A fully managed platform to build GenAI apps via a single API across many model providers (Anthropic, Meta, Mistral, Cohere, Amazon Nova, etc.), with first-class features for guardrails, RAG (Knowledge Bases), agents, fine-tuning/custom models, and enterprise security (VPC endpoints, KMS). ([Amazon Web Services, Inc.][1])
Author

Benedict Thekkel

Model lineup & modalities (high level)

  • Amazon Nova family (Micro/Lite/Pro + Canvas/Reel + Premier/Sonic variants): text + multimodal (image/video/speech) options, region coverage varies. (AWS Documentation)
  • 3rd-party FMs: Anthropic Claude, Meta Llama, Mistral, Cohere, AI21 (Jamba), Stability, TwelveLabs, Writer (availability varies by region). (Amazon Web Services, Inc.)

Tip: The Models doc lists model IDs, regions, and streaming support—bookmark it. (AWS Documentation)


The core building blocks

Capability What it solves When to use
Converse API One messages API across models; supports tools/function-calling & streaming. Unify your client/server code paths. (AWS Documentation)
Guardrails Safety, PII redaction, topic filters; works with Agents & KBs. Enterprise policy enforcement and factuality checks. (AWS Documentation)
Knowledge Bases (RAG) Managed ingestion, chunking, embeddings, vector index; prompt-augmentation out of the box. Ship RAG fast; swap indexes later if needed. (AWS Documentation)
Agents (incl. AgentCore) Orchestrate tools, KBs, multi-step tasks; new AgentCore adds gateway, memory, browser, runtime. Complex workflows, autonomous task execution. (AWS Documentation)
Custom Models Fine-tuning/continued pre-training/distillation on managed jobs. Domain specialization without hosting your own stack. (AWS Documentation)

Pricing (mental model)

  • On-demand: pay per input/output token (or media unit). Best to start. (Amazon Web Services, Inc.)
  • Provisioned Throughput (PTT): dedicated capacity, discounted for steady high volume. (Amazon Web Services, Inc.)
  • Batch mode: cheaper for large offline jobs.
  • Customization costs: extra for fine-tuning/hosting artifacts—budget separately. (DEV Community)

Rule of thumb: start on on-demand → measure → move heavy/steady tenants to PTT.


Security, privacy & networking (the important bits)

  • Private networking: use VPC endpoints/PrivateLink for Bedrock + S3; keep traffic off the public internet. (AWS Documentation)
  • Encryption: use KMS for prompts, logs, and Knowledge Base resources (AWS-owned key by default; bring CMK if needed). (AWS Documentation)
  • Data usage: AWS shared responsibility model applies; you control content/config. (AWS Documentation)
  • Guardrails: plug in at inference time (Converse/stream) or within Agents/KBs. (AWS Documentation)

Quick starts

1) Call a model (Python, Converse API)

import boto3, json
brt = boto3.client("bedrock-runtime", region_name="ap-southeast-2")

resp = brt.converse(
  modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
  messages=[{"role":"user","content":[{"text":"Summarise RAG in 3 bullets."}]}],
  inferenceConfig={"maxTokens": 512, "temperature": 0.2}
)
print(resp["output"]["message"]["content"][0]["text"])

Converse works similarly across supported models; add tool schemas for function-calling when you need it. (AWS Documentation)

2) Spin up a Knowledge Base (RAG)

  • Point it at S3, pick an embedding model and vector store (Bedrock-managed options).
  • Bedrock handles ingestion/chunking/embedding; you query via KB APIs or let Agents use it. (AWS Documentation)

3) Add Guardrails (policy + PII + contextual grounding)

  • Create a guardrail policy; attach it in Converse or to your Agent/KB.
  • Use grounding checks to reduce hallucinations against your RAG context. (AWS Documentation)

Architecture patterns (pick your lane)

A. Pure managed Client → API (private) → Bedrock (Converse)GuardrailsKnowledge Base Good for: fastest path, least ops.

B. Agentic workflows User → Agent (AgentCore) → Tools (APIs/Lambda), KB, Code-Interpreter, Browser → Model Good for: multi-step tasks, enterprise workflows. (TechRadar)

C. Hybrid RAG Bedrock for inference + OpenSearch/Aurora pgvector for vectors; swap in/out as needs evolve. (KB still a great default.) (AWS Documentation)


Tuning & throughput tips

  • Prefer streaming for UX; it’s supported on most chat models. (AWS Documentation)
  • Keep temperature low for factual tasks; raise only for ideation.
  • Cache retrievals and dedupe media in multi-modal workloads.
  • For stable high QPS, move to Provisioned Throughput and right-size capacity. (Amazon Web Services, Inc.)

Regions & availability

Model availability (and streaming) is per-model, per-region—check the table before you pick IDs for prod (e.g., some Nova variants in ap-southeast-2 today, others via cross-region). (AWS Documentation)


Common pitfalls (and fixes)

  • “Model not found” in region → switch to a supported region or use the cross-region listing. (AWS Documentation)
  • Hallucinations in RAG → enable contextual grounding guardrails and tighten retrieval metadata filters. (Amazon Web Services, Inc.)
  • Data egress surprises → use VPC endpoints; keep S3 + Bedrock private. (AWS Documentation)
  • Latency spikes at scale → adopt PTT; split tenants by workload. (Amazon Web Services, Inc.)

When Bedrock vs roll-your-own?

  • Choose Bedrock when you want model choice, managed safety/RAG/agents, and private networking without owning infra.
  • Roll your own (EKS + vLLM/TGI) when you need custom runtimes, super-tight $/token, or niche models not offered in Bedrock. (You can still keep Bedrock for managed bits like KB/Guardrails.)

If you tell me your target region (likely ap-southeast-2), model(s), and latency/QPS/budget, I’ll sketch a minimal IaC plan (Terraform/CDK) plus a load test harness you can run today to size on-demand vs provisioned throughput.

Code

import boto3
import os
from dotenv import load_dotenv

# Load environment variables from .env (optional)
load_dotenv()

region = os.getenv("AWS_BEDROCK_REGION_NAME", "us-east-1")
aws_access_key_id = os.getenv("AWS_BEDROCK_ACCESS_KEY_ID")
aws_secret_access_key = os.getenv("AWS_BEDROCK_SECRET_ACCESS_KEY")

# Initialize the Bedrock control-plane client
bedrock_client = boto3.client(
    "bedrock",
    region_name=region,
    aws_access_key_id=aws_access_key_id,
    aws_secret_access_key=aws_secret_access_key,
)

# List available foundation models
resp = bedrock_client.list_foundation_models()

print("\n🧠 Available AWS Bedrock Models:\n")
for model in resp.get("modelSummaries", []):
    name = model.get("modelName")
    provider = model.get("providerName")
    status = model.get("modelLifecycle", {}).get("status")
    print(f"- {name} ({provider}) — {status}")

🧠 Available AWS Bedrock Models:

- Stable Image Remove Background (Stability AI) — ACTIVE
- Stable Image Style Guide (Stability AI) — ACTIVE
- Stable Image Control Sketch (Stability AI) — ACTIVE
- Claude Sonnet 4 (Anthropic) — ACTIVE
- Stable Image Erase Object (Stability AI) — ACTIVE
- Stable Image Control Structure (Stability AI) — ACTIVE
- Stable Image Search and Recolor (Stability AI) — ACTIVE
- gpt-oss-120b (OpenAI) — ACTIVE
- Pegasus v1.2 (TwelveLabs) — ACTIVE
- Stable Image Style Transfer (Stability AI) — ACTIVE
- Embed v4 (Cohere) — ACTIVE
- Claude Sonnet 4.5 (Anthropic) — ACTIVE
- Marengo Embed v2.7 (TwelveLabs) — ACTIVE
- Stable Image Search and Replace (Stability AI) — ACTIVE
- Qwen3-Coder-30B-A3B-Instruct (Qwen) — ACTIVE
- Qwen3 32B (dense) (Qwen) — ACTIVE
- Stable Image Inpaint (Stability AI) — ACTIVE
- gpt-oss-20b (OpenAI) — ACTIVE
- Claude Opus 4.1 (Anthropic) — ACTIVE
- Titan Text Large (Amazon) — ACTIVE
- Titan Image Generator G1 (Amazon) — ACTIVE
- Titan Image Generator G1 (Amazon) — ACTIVE
- Titan Image Generator G1 v2 (Amazon) — ACTIVE
- Nova Premier (Amazon) — ACTIVE
- Nova Premier (Amazon) — ACTIVE
- Nova Premier (Amazon) — ACTIVE
- Nova Premier (Amazon) — ACTIVE
- Nova Premier (Amazon) — ACTIVE
- Nova Pro (Amazon) — ACTIVE
- Nova Pro (Amazon) — ACTIVE
- Nova Pro (Amazon) — ACTIVE
- Nova Lite (Amazon) — ACTIVE
- Nova Lite (Amazon) — ACTIVE
- Nova Lite (Amazon) — ACTIVE
- Nova Canvas (Amazon) — ACTIVE
- Nova Reel (Amazon) — ACTIVE
- Nova Reel (Amazon) — ACTIVE
- Nova Micro (Amazon) — ACTIVE
- Nova Micro (Amazon) — ACTIVE
- Nova Micro (Amazon) — ACTIVE
- Nova Sonic (Amazon) — ACTIVE
- Titan Text Embeddings v2 (Amazon) — ACTIVE
- Titan Text G1 - Lite (Amazon) — ACTIVE
- Titan Text G1 - Lite (Amazon) — ACTIVE
- Titan Text G1 - Express (Amazon) — ACTIVE
- Titan Text G1 - Express (Amazon) — ACTIVE
- Titan Embeddings G1 - Text (Amazon) — ACTIVE
- Titan Embeddings G1 - Text (Amazon) — ACTIVE
- Titan Text Embeddings V2 (Amazon) — ACTIVE
- Titan Text Embeddings V2 (Amazon) — ACTIVE
- Titan Multimodal Embeddings G1 (Amazon) — ACTIVE
- Titan Multimodal Embeddings G1 (Amazon) — ACTIVE
- SDXL 1.0 (Stability AI) — LEGACY
- SDXL 1.0 (Stability AI) — LEGACY
- Jamba 1.5 Large (AI21 Labs) — ACTIVE
- Jamba 1.5 Mini (AI21 Labs) — ACTIVE
- Claude Instant (Anthropic) — LEGACY
- Claude (Anthropic) — LEGACY
- Claude (Anthropic) — LEGACY
- Claude (Anthropic) — LEGACY
- Claude (Anthropic) — LEGACY
- Claude 3 Sonnet (Anthropic) — LEGACY
- Claude 3 Sonnet (Anthropic) — LEGACY
- Claude 3 Sonnet (Anthropic) — LEGACY
- Claude 3 Haiku (Anthropic) — ACTIVE
- Claude 3 Haiku (Anthropic) — ACTIVE
- Claude 3 Haiku (Anthropic) — ACTIVE
- Claude 3 Opus (Anthropic) — ACTIVE
- Claude 3 Opus (Anthropic) — ACTIVE
- Claude 3 Opus (Anthropic) — ACTIVE
- Claude 3 Opus (Anthropic) — ACTIVE
- Claude 3.5 Sonnet (Anthropic) — ACTIVE
- Claude 3.5 Sonnet v2 (Anthropic) — ACTIVE
- Claude 3.7 Sonnet (Anthropic) — ACTIVE
- Claude 3.5 Haiku (Anthropic) — ACTIVE
- Claude Opus 4 (Anthropic) — ACTIVE
- Command R (Cohere) — ACTIVE
- Command R+ (Cohere) — ACTIVE
- Embed English (Cohere) — ACTIVE
- Embed English (Cohere) — ACTIVE
- Embed Multilingual (Cohere) — ACTIVE
- Embed Multilingual (Cohere) — ACTIVE
- Rerank 3.5 (Cohere) — ACTIVE
- DeepSeek-R1 (DeepSeek) — ACTIVE
- Llama 3 8B Instruct (Meta) — ACTIVE
- Llama 3 70B Instruct (Meta) — ACTIVE
- Llama 3.1 8B Instruct (Meta) — ACTIVE
- Llama 3.1 70B Instruct (Meta) — ACTIVE
- Llama 3.2 11B Instruct (Meta) — ACTIVE
- Llama 3.2 90B Instruct (Meta) — ACTIVE
- Llama 3.2 1B Instruct (Meta) — ACTIVE
- Llama 3.2 3B Instruct (Meta) — ACTIVE
- Llama 3.3 70B Instruct (Meta) — ACTIVE
- Llama 4 Scout 17B Instruct (Meta) — ACTIVE
- Llama 4 Maverick 17B Instruct (Meta) — ACTIVE
- Mistral 7B Instruct (Mistral AI) — ACTIVE
- Mixtral 8x7B Instruct (Mistral AI) — ACTIVE
- Mistral Large (24.02) (Mistral AI) — ACTIVE
- Mistral Small (24.02) (Mistral AI) — ACTIVE
- Pixtral Large (25.02) (Mistral AI) — ACTIVE
Back to top