Skip to main content

Bedrock Embedding

Supported Embedding Models

ProviderLiteLLM RouteAWS DocumentationCost Tracking
Amazon Titanbedrock/amazon.titan-*Amazon Titan Embeddings
Amazon Novabedrock/amazon.nova-*Amazon Nova Embeddings
Coherebedrock/cohere.*Cohere Embeddings
TwelveLabsbedrock/us.twelvelabs.*TwelveLabs

Async Invoke Support

LiteLLM supports AWS Bedrock's async-invoke feature for embedding models that require asynchronous processing, particularly useful for large media files (video, audio) or when you need to process embeddings in the background.

Supported Models

ProviderAsync Invoke RouteUse Case
Amazon Novabedrock/async_invoke/amazon.nova-2-multimodal-embeddings-v1:0Multimodal embeddings with segmentation for long text, video, and audio
TwelveLabs Marengobedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0Video, audio, image, and text embeddings

Required Parameters

When using async-invoke, you must provide:

ParameterDescriptionRequired
output_s3_uriS3 URI where the embedding results will be stored✅ Yes
input_typeType of input: "text", "image", "video", or "audio"✅ Yes
aws_region_nameAWS region for the request✅ Yes

Usage

Basic Async Invoke

from litellm import embedding

# Text embedding with async-invoke
response = embedding(
model="bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0",
input=["Hello world from LiteLLM async invoke!"],
aws_region_name="us-east-1",
input_type="text",
output_s3_uri="s3://your-bucket/async-invoke-output/"
)

print(f"Job submitted! Invocation ARN: {response._hidden_params._invocation_arn}")

Video/Audio Embedding

# Video embedding (requires async-invoke)
response = embedding(
model="bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0",
input=["s3://your-bucket/video.mp4"], # S3 URL for video
aws_region_name="us-east-1",
input_type="video",
output_s3_uri="s3://your-bucket/async-invoke-output/"
)

print(f"Video embedding job submitted! ARN: {response._hidden_params._invocation_arn}")

Image Embedding with Base64

import base64

# Load and encode image
with open("image.jpg", "rb") as img_file:
img_data = base64.b64encode(img_file.read()).decode('utf-8')
img_base64 = f"data:image/jpeg;base64,{img_data}"

response = embedding(
model="bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0",
input=[img_base64],
aws_region_name="us-east-1",
input_type="image",
output_s3_uri="s3://your-bucket/async-invoke-output/"
)

Retrieving Job Information

Getting Job ID and Invocation ARN

The async-invoke response includes the invocation ARN in the hidden parameters:

response = embedding(
model="bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0",
input=["Hello world"],
aws_region_name="us-east-1",
input_type="text",
output_s3_uri="s3://your-bucket/async-invoke-output/"
)

# Access invocation ARN
invocation_arn = response._hidden_params._invocation_arn
print(f"Invocation ARN: {invocation_arn}")

# Extract job ID from ARN (last part after the last slash)
job_id = invocation_arn.split("/")[-1]
print(f"Job ID: {job_id}")

Checking Job Status

Use LiteLLM's retrieve_batch function to check if your job is still processing:

from litellm import retrieve_batch

def check_async_job_status(invocation_arn, aws_region_name="us-east-1"):
"""Check the status of an async invoke job using LiteLLM batch API"""
try:
response = retrieve_batch(
batch_id=invocation_arn, # Pass the invocation ARN here
custom_llm_provider="bedrock",
aws_region_name=aws_region_name
)
return response
except Exception as e:
print(f"Error checking job status: {e}")
return None

# Check status
status = check_async_job_status(invocation_arn, "us-east-1")
if status:
print(f"Job Status: {status.status}") # "in_progress", "completed", or "failed"
print(f"Output Location: {status.metadata['output_file_id']}") # S3 URI where results are stored

Polling Until Complete

Here's a complete example of polling for job completion:

def wait_for_async_job(invocation_arn, aws_region_name="us-east-1", max_wait=3600):
"""Poll job status until completion"""
start_time = time.time()

while True:
status = retrieve_batch(
batch_id=invocation_arn,
custom_llm_provider="bedrock",
aws_region_name=aws_region_name,
)

if status.status == "completed":
print("✅ Job completed!")
return status
elif status.status == "failed":
error_msg = status.metadata.get('failure_message', 'Unknown error')
raise Exception(f"❌ Job failed: {error_msg}")
else:
elapsed = time.time() - start_time
if elapsed > max_wait:
raise TimeoutError(f"Job timed out after {max_wait} seconds")

print(f"⏳ Job still processing... (elapsed: {elapsed:.0f}s)")
time.sleep(10) # Wait 10 seconds before checking again

# Wait for completion
completed_status = wait_for_async_job(invocation_arn)
output_s3_uri = completed_status.metadata['output_file_id']
print(f"Results available at: {output_s3_uri}")

Note: The actual embedding results are stored in S3. When the job is completed, download the results from the S3 location specified in status.metadata['output_file_id']. The results will be in JSON/JSONL format containing the embedding vectors.

Error Handling

Common Errors

ErrorCauseSolution
ValueError: output_s3_uri cannot be emptyMissing S3 output URIProvide a valid S3 URI
ValueError: Input type 'video' requires async_invoke routeUsing video/audio without async-invokeUse bedrock/async_invoke/ model prefix
ValueError: input_type is requiredMissing input type parameterSpecify input_type parameter

Example Error Handling

try:
response = embedding(
model="bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0",
input=["Hello world"],
aws_region_name="us-east-1",
input_type="text",
output_s3_uri="s3://your-bucket/output/" # Required for async-invoke
)
print("Job submitted successfully!")

except ValueError as e:
if "output_s3_uri cannot be empty" in str(e):
print("Error: Please provide a valid S3 output URI")
elif "requires async_invoke route" in str(e):
print("Error: Use async_invoke model for video/audio inputs")
else:
print(f"Error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")

Best Practices

  1. Use async-invoke for large files: Video and audio files are better processed asynchronously
  2. Use LiteLLM batch API: Use retrieve_batch() instead of direct Bedrock API calls for status checking
  3. Monitor job status: Check job status periodically using the batch API to know when results are ready
  4. Handle errors gracefully: Implement proper error handling for network issues and job failures
  5. Set appropriate timeouts: Consider the processing time for large files
  6. Use S3 for large inputs: For video/audio, use S3 URLs instead of base64 encoding

Limitations

  • Async-invoke is supported for TwelveLabs Marengo and Amazon Nova models
  • Results are stored in S3 and must be retrieved separately using the output file ID
  • Job status checking requires using LiteLLM's retrieve_batch() function
  • No built-in polling mechanism in LiteLLM (must implement your own status checking loop)

API keys

This can be set as env variables or passed as params to litellm.embedding()

import os
os.environ["AWS_ACCESS_KEY_ID"] = "" # Access key
os.environ["AWS_SECRET_ACCESS_KEY"] = "" # Secret access key
os.environ["AWS_REGION_NAME"] = "" # us-east-1, us-east-2, us-west-1, us-west-2

Usage

LiteLLM Python SDK

from litellm import embedding
response = embedding(
model="bedrock/amazon.titan-embed-text-v1",
input=["good morning from litellm"],
)
print(response)

LiteLLM Proxy Server

1. Setup config.yaml

model_list:
- model_name: titan-embed-v1
litellm_params:
model: bedrock/amazon.titan-embed-text-v1
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
aws_region_name: us-east-1
- model_name: titan-embed-v2
litellm_params:
model: bedrock/amazon.titan-embed-text-v2:0
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
aws_region_name: us-east-1

2. Start Proxy

litellm --config /path/to/config.yaml

3. Use with OpenAI Python SDK

import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)

response = client.embeddings.create(
input=["good morning from litellm"],
model="titan-embed-v1"
)
print(response)

4. Use with LiteLLM Python SDK

import litellm
response = litellm.embedding(
model="titan-embed-v1", # model alias from config.yaml
input=["good morning from litellm"],
api_base="http://0.0.0.0:4000",
api_key="anything"
)
print(response)

Supported AWS Bedrock Embedding Models

Model NameUsageSupported Additional OpenAI params
Amazon Nova Multimodal Embeddingsembedding(model="bedrock/amazon.nova-2-multimodal-embeddings-v1:0", input=input)Supports multimodal input (text, image, video, audio), multiple purposes, dimensions (256, 384, 1024, 3072)
Titan Embeddings V2embedding(model="bedrock/amazon.titan-embed-text-v2:0", input=input)here
Titan Embeddings - V1embedding(model="bedrock/amazon.titan-embed-text-v1", input=input)here
Titan Multimodal Embeddingsembedding(model="bedrock/amazon.titan-embed-image-v1", input=input)here
TwelveLabs Marengo Embed 2.7embedding(model="bedrock/us.twelvelabs.marengo-embed-2-7-v1:0", input=input)Supports multimodal input (text, video, audio, image)
Cohere Embeddings - Englishembedding(model="bedrock/cohere.embed-english-v3", input=input)here
Cohere Embeddings - Multilingualembedding(model="bedrock/cohere.embed-multilingual-v3", input=input)here
Cohere Embed v4embedding(model="bedrock/cohere.embed-v4:0", input=input)Supports text and image input, configurable dimensions (256, 512, 1024, 1536), 128k context length

Advanced - Drop Unsupported Params

Advanced - Pass model/provider-specific Params