Unlock Insights From Images With Amazon Q Business

2025-06-30•Unknown•10 minutes read

Aws

Generative ai

Data extraction

The Challenge of Visual Data

Many organizations rely on visual assets like diagrams, charts, and technical illustrations to convey complex information. While text documents are easily indexed by knowledge management systems, the rich data within these images often remains locked away, inaccessible to search tools and AI assistants. This creates a significant knowledge gap, forcing manual interpretation of visual data and preventing automated systems from using this critical information for decision-making. While Amazon Q Business can handle embedded images in documents, the custom document enrichment (CDE) feature dramatically expands this capability to include standalone image files like JPGs and PNGs.

This guide provides a step-by-step walkthrough for implementing the CDE feature within an Amazon Q Business application. We will configure an AWS Lambda function to process various image types, demonstrating how this integration enhances Amazon Q's ability to deliver comprehensive insights from both textual and visual sources.

A Practical Example Analyzing Educational Data

Imagine you work for a national educational consultancy. Your data, including charts and demographic information for different AWS Regions, is stored in an Amazon S3 bucket. The following bar chart shows student distribution by age across several cities. Insights from such visuals are crucial but are traditionally trapped inside image files.

Distribution Chart

With Amazon Q Business and CDE, you can ask natural language questions about this chart, such as, “Which city has the highest number of students in the 13–15 age range?” or “Compare the student demographics between City 1 and City 4.”

This is achieved by:

Detecting and processing image files during document ingestion.
Using Amazon Bedrock with AWS Lambda to interpret the visual information.
Extracting structured data and insights.
Making the information searchable through natural language queries.

How the Image Analysis Solution Works

This solution uses the CDE capability of Amazon Q Business to extract information from image files. When the data ingestion process encounters an image file in an S3 bucket, CDE rules trigger a Lambda function. This function calls the Amazon Bedrock API, using multimodal large language models (LLMs) to analyze the image and extract contextual information. The resulting text is then indexed in Amazon Q Business, allowing users to search for insights from images using natural language.

The high-level architecture is shown below. While we use Amazon S3 as the data source, this solution can be adapted for other data sources supported by Amazon Q Business.

Arch Diagram

The implementation involves three main steps:

Create an Amazon Q Business application and sync it with an S3 bucket.
Configure the application's CDE for the S3 data source.
Write the logic to extract context from the images.

What You Need to Get Started

Before you begin, ensure you have the following prerequisites:

An AWS account.
An Amazon Q Business Pro user with admin permissions. See Amazon Q Business pricing for details.
AWS Identity and Access Management (IAM) permissions to manage roles and policies.
A supported data source, such as an S3 bucket with your documents.
Access to an Amazon Bedrock LLM in the required AWS Region.

Step 1: Set Up Your Amazon Q Application

First, create an Amazon Q Business application and connect it to your S3 bucket. For detailed instructions, you can follow the guide on how to discover insights from Amazon S3 with the Amazon Q S3 connector.

The basic steps are:

Create the application via the AWS Management Console or AWS CLI.
Create an index for your application.
Use the built-in Amazon S3 connector to link the application to your S3 bucket.

Step 2: Configure Custom Document Enrichment

CDE allows you to modify and enhance documents during the ingestion process to improve search quality. By integrating with services like Amazon Bedrock, you can even extract context from binary files like images.

To configure CDE for your S3 data source:

In your Amazon Q application, navigate to Data sources.
Select your S3 data source.
In the configuration, find the Custom Document Enrichment section.
Configure pre-extraction rules to trigger a Lambda function when files with specific extensions (e.g., .png, .jpg) are found.

Reference Settings

Step 3: Extracting Insights with Lambda and Bedrock

The Lambda function uses Anthropic’s Claude 3.7 Sonnet model via an Amazon Bedrock API call to extract insights from image files. A critical part of this process is prompt engineering. We recommend experimenting with different prompts to achieve the desired output. You can use Amazon Bedrock's features to optimize a prompt for your specific use case.

Below are snippets from the Python Lambda function that handles the image analysis. We first import the necessary libraries and initialize the Boto3 clients for S3 and Amazon Bedrock runtime.

python import boto3 import logging import json from typing import List, Dict, Any from botocore.config import Config

MODEL_ID = "us.anthropic.claude-3-7-sonnet-20250219-v1:0" MAX_TOKENS = 2000 MAX_RETRIES = 2 FILE_FORMATS = ("jpg", "jpeg", "png")

logger = logging.getLogger() logger.setLevel(logging.INFO) s3 = boto3.client('s3') bedrock = boto3.client('bedrock-runtime', config=Config(read_timeout=3600, region_name='us-east-1'))

The prompt sent to the model is broken into a prefix and suffix for readability and to leverage prompt caching to reduce latency and cost.

python prompt_prefix = """You are an expert image reader tasked with generating detailed descriptions for various types of images. These images may include technical diagrams, graphs and charts, categorization diagrams, data flow and process flow diagrams, hierarchical and timeline diagrams, infographics, screenshots and product diagrams/images from user manuals. The description of these images needs to be very detailed so that user can ask questions based on the image, which can be answered by only looking at the descriptions that you generate. Here is the image you need to analyze:

"""

prompt_suffix = """

Please follow these steps to analyze the image and generate a comprehensive description:

Image type: Classify the image as one of technical diagrams, graphs and charts, categorization diagrams, data flow and process flow diagrams, hierarchical and timeline diagrams, infographics, screenshots and product diagrams/images from user manuals. The description of these images needs to be very detailed so that user can ask questions based on the image, which can be answered by only looking at the descriptions that you generate or other.
Items: Carefully examine the image and extract all entities, texts, and numbers present. List these elements in <image_items> tags.
Detailed Description: Using the information from the previous steps, provide a detailed description of the image. This should include the type of diagram or chart, its main purpose, and how the various elements interact or relate to each other. Capture all the crucial details that can be used to answer any followup questions. Write this description in <image_description> tags.
Data Estimation (for charts and graphs only): If the image is a chart or graph, capture the data in the image in CSV format to be able to recreate the image from the data. Ensure your response captures all relevant details from the chart that might be necessary to answer any follow up questions from the chart. If exact values cannot be inferred, provide an estimated range for each value in tags. If no data is present, respond with "No data found".

Present your analysis in the following format:

[Classify the image type here]

<image_items> [List all extracted entities, texts, and numbers here] </image_items>

<image_description> [Provide a detailed description of the image here] </image_description>

[If applicable, provide estimated number ranges for chart elements here]

Remember to be thorough and precise in your analysis. If you're unsure about any aspect of the image, state your uncertainty clearly in the relevant section. """

The complete Lambda function code integrates these parts to process an image from S3, send it to Bedrock, and save the text analysis back to S3 for Amazon Q to index.

python

Example Lambda function for image processing

import boto3 import logging import json from typing import List, Dict, Any from botocore.config import Config

MODEL_ID = "us.anthropic.claude-3-7-sonnet-20250219-v1:0" MAX_TOKENS = 2000 MAX_RETRIES = 2 FILE_FORMATS = ("jpg", "jpeg", "png")

logger = logging.getLogger() logger.setLevel(logging.INFO) s3 = boto3.client('s3') bedrock = boto3.client('bedrock-runtime', config=Config(read_timeout=3600, region_name='us-east-1'))

prompt_prefix = """You are an expert image reader tasked with generating detailed descriptions for various types of images..."""

... (prompt_prefix and prompt_suffix as shown above) ...

def _llm_input(s3Bucket: str, s3ObjectKey: str, file_format: str) -> List[Dict[str, Any]]: s3_response = s3.get_object(Bucket=s3Bucket, Key=s3ObjectKey) image_content = s3_response['Body'].read() message = { "role": "user", "content": [ {"text": prompt_prefix}, { "image": { "format": file_format, "source": { "bytes": image_content } } }, {"text": prompt_suffix} ] } return [message]

def _invoke_model(messages: List[Dict[str, Any]]) -> Dict[str, Any]: for attempt in range(MAX_RETRIES): try: response = bedrock.converse( modelId=MODEL_ID, messages=messages, inferenceConfig={ "maxTokens": MAX_TOKENS, "temperature": 0, } ) return response except Exception as e: print(e) raise Exception(f"Failed to call model after {MAX_RETRIES} attempts")

def generate_image_description(s3Bucket: str, s3ObjectKey: str, file_format: str) -> str: messages = _llm_input(s3Bucket, s3ObjectKey, file_format) response = _invoke_model(messages) return response['output']['message']['content'][0]['text']

def lambda_handler(event, context): logger.info("Received event: %s" % json.dumps(event)) s3Bucket = event.get("s3Bucket") s3ObjectKey = event.get("s3ObjectKey") file_format = s3ObjectKey.lower().split('.')[-1] new_key = 'cde_output/' + s3ObjectKey + '.txt' if (file_format in FILE_FORMATS): afterCDE = generate_image_description(s3Bucket, s3ObjectKey, file_format) s3.put_object(Bucket=s3Bucket, Key=new_key, Body=afterCDE) return { "version": "v0", "s3ObjectKey": new_key, "metadataUpdates": [] }

Note that in addition to Amazon Q pricing, this solution incurs costs for AWS Lambda and Amazon Bedrock.

Seeing the Results in Action

Once the S3 data is synced, you can query the Amazon Q Business application about the student distribution graph.

Q: Which City has the highest number of students in the 13-15 age range?

Natural Language Query Response

Q: Compare the student demographics between City 1 and City 4?

Natural Language Query Response

Even though the original graph bars lacked explicit numerical labels, Amazon Q Business successfully extracts the contextual information. This transforms static images into queryable knowledge assets, allowing users to explore deeper insights through natural language.

Best Practices for Configuration

When setting up CDE for Amazon S3, consider the following best practices:

Use conditional rules to process only specific file types.
Monitor Lambda execution with Amazon CloudWatch to track performance and errors.
Set appropriate timeout values for Lambda, especially for large files.
Use incremental syncing to process only new or modified documents.
Apply document attributes to track which documents have been processed by CDE.

Cleaning Up Your AWS Resources

To avoid ongoing charges, clean up the resources you created:

Remove users and groups from the Amazon Q Business application.
Delete the Amazon Q Business application.
Delete the Lambda function.
Empty and delete the S3 bucket, following the guide on Deleting a general purpose bucket.

Conclusion: Unlocking Visual Data

This solution shows how combining Amazon Q Business, CDE, and Amazon Bedrock can convert static visualizations into interactive, queryable knowledge assets. By using these services together, organizations can finally bridge the gap between visual information and actionable insights, enabling users to interact with all their data in more powerful and intuitive ways.

To learn more, explore What is Amazon Q Business? and get started with the Amazon Bedrock documentation.

Read Original Post