Boost Confluence Search with Amazon Q Business
Unlock hidden insights in Confluence with Amazon Q Business's custom document enrichment. Learn to convert images to searchable text, enhancing information discovery and decision-making. Implement this solution to transform your knowledge management system and boost team productivity.
Joshua
Amazon Employee
Published Nov 9, 2024
Last Modified Nov 22, 2024
Amazon Q Business is a generative AI-powered assistant designed to enhance enterprise operations. It’s a fully managed service that helps provide accurate answers to users’ questions while adhering to the security and access restrictions of the content. You can tailor Amazon Q Business to your specific business needs by connecting to your company’s information and enterprise systems using built-in connectors to a variety of enterprise data sources. It enables users in various roles, such as marketing managers, project managers, and sales representatives, to have tailored conversations, solve business problems, generate content, take action, and more, through a web interface. This service aims to help make employees work smarter, move faster, and drive significant impact by providing immediate and relevant information to help them with their tasks.
As organizations increasingly rely on collaborative platforms like Confluence to store and share critical information, the need for efficient and reliable search capabilities becomes essential. However, valuable content embedded in images within Confluence pages often remains hidden, limiting its searchability and discoverability. While Amazon Q Business supports Confluence as a data connector, it does not natively support the indexing of image files, posing a challenge. This limitation can hinder productivity, delay decision-making, and prevent users from quickly accessing the information they need.
To address this issue, organizations can leverage the power of custom document enrichment (CDE) in Amazon Q Business. By converting the images in Confluence pages into searchable text, businesses can unlock the hidden insights and contextual information that were previously inaccessible. This approach empowers users to find the right information quickly and efficiently, driving increased productivity and better decision-making.
The process of implementing CDE in Amazon Q Business involves several steps, including the seamless integration with the Confluence platform. This integration allows for the automatic extraction and conversion of image content into searchable text, ensuring that all relevant information is indexed and readily available to users.
Document enrichment offers two kinds of methods that you can use for your solution:
- Configure basic operations – Use basic operations to add, update, or delete document attributes from your data.
- Configure Lambda functions – Use a preconfigured Lambda function to perform more customized, advanced document attribute manipulation logic to your data. The configured Lambda functions can be invoked during either before documents being ingested (PreExtraction Lambda function), or after documents being ingested (PostExtraction Lambda function).
When you implement your solution, you can choose to use both document enrichment methods together or just one method.
In this post, I will show you how to index and run real-time queries with image files using Amazon Q Business.
Solution overview
CDE enables you to create, modify, or delete document metadata and content when you ingest your documents into Amazon Q Business. Let’s understand the Amazon Q Business document ingestion workflow in the context of CDE.
The following diagram illustrates the CDE workflow
There are six components associated with CDE Lambda functions:
- Confluence - This is the data source used by the Amazon Q application. The confluence contains enterprise data which will include both images and text files.
- CDE Lambda function - This Lambda function needs to be in the same region as the Amazon Q Business application, and is invoked as part of the ingestion process for each document.
- Bedrock - The invoked lambda function calls the Bedrock API using foundation model’s image-to-text capability so Amazon Q Business can ingest image files.
- CDE S3 bucket - This is an S3 bucket used by the Amazon Q application and the Lambda function to store text files converted from Images files. This bucket needs to be in the same region as the Amazon Q application.
- CDE Lambda function role - This Service role provides the lambda permission to read and write to the CDE S3 bucket, update the datasource of Amazon Q Business, and then call the model using Bedrock API.
- CDE IAM role for Q Business service - This IAM role provides the Amazon Q Business application the permissions to read and write to the CDE S3 bucket and then invoke the lambda function.
In this post, you will learn how to use CDE feature from the following three examples:
- Using basic operation to create a document attribute “_category” before the document being ingested
- Using PreExtraction function to index a image file by using foundation model’s image-to-text capability so Amazon Q Business can ingest image files.
- Using PostExtraction function to index document metadata with the doc, and also log the indexed image document with metadata in the CDE S3 bucket.
Prerequisites
You can follow the step-by-step guide in your AWS account to get a first-hand experience of using CDE. Before getting started, complete the following prerequisites:
- We will be using Amazon Bedrock to access foundation models in this post. In your AWS account, enable the Claude 3 haiku model in Amazon Bedrock.
- In your AWS Account, create an Amazon Q application with confluence as the data source.
- Create an Amazon Simple Storage Service (Amazon S3) bucket to use as a data source to store your text files. Refer to Amazon S3 User Guide for more information.
- Create an Service role for Lambda - This service role provides the lambda the permissions to read and write to the CDE S3 bucket, update the data source of Amazon Q Business, and then call the model using Bedrock API. Refer to Create a role for an AWS service for more information. When creating the service role select Lambda as service and for the Permissions policies, below is the policy to attached to the role
- Create the
preExtraction
Lambda function and select the Lambda service role you created earlier as the execution role. Refer to Lambda function User Guide to create Lambda function. When creating the Lambda function for Runtime, choose Python 3.12. Use the code below for the function code of thepreExtraction
function.
- Click on the Configuration table and select the Environment variables to fill in your Confluence authentication credentials
- Create the
postExtraction
Lambda function and select the Lambda service role you created earlier as the execution role as you did in step 4. Refer to Lambda function User Guide to create Lambda function. When creating the Lambda function for Runtime, choose Python 3.12. Use the code below for the function code of thepostExtraction
function.
In this section, you will configure one basic operation, one PreExtraction and one PostExtraction CDE operations via Amazon Q Business console. In order to save the data sync time required for CDEs, you will add all three types of CDE operations first and test them after the data sync that usually takes 10-15 minutes.
- Download a sample image file: Hydraulic_elevator_motor_label.jpg to a local drive on your computer. It is a image file in JPG format, and contains a hydraulic elevator motor with a label on top.
- Open your confluence account via Atlassian
- In the left navigation pane, choose Confluence.
- In the top, click the Spaces tab and select Create a space
- Also in the left navigation pane, select Create and choose page to create a page.
- In the page you just created upload the image file you downloaded earlier and click on Publish and the top right panel.
- Copy the page number and paste it on line 151 in the
Lambda handler
function of thepreExtraction
Lambda function.
- After file upload is successfully finished, navigate to Amazon Q Business service in the navigation pane.
- Choose Document enrichment in the navigation pane, and choose Add document enrichment.
- For Data Source ID, select the Confluence data source you setup before in the App.
- Enter one basic operation rule as shown in the following figure to add metadata _category to uploaded image file in confluence page. Select _source_uri under Document field name, select Contains under Conditional operator, type in {"stringValue":"Image"} as Condtional value. Select _category under Index field name, type in {"stringValue":"image_file"} as Target value, select Update under Target action.
- Choose Next to configure PreExtraction Lambda function.
- On the Configure Lambda functions page, in the Lambda for pre-extraction section, First, Select _source_uri under Document field name, select Contains under Conditional operator, type in {"stringValue":"Image"} as Condtional value. Next, fill in PreExtraction Lambda function ARN and Amazon CDES3BucketName for CDE as shown from the previous step. For Service permissions, choose Enter custom role ARN and enter the CDELambdaRolename create earlier. This is a role policy to allow Amazon Q to run PreExtractionHookConfiguration.
- For PostExtraction Lmabda function, first Select _source_uri under Document field name, select Contains under Conditional operator, type in {"stringValue":"Image"} as Condtional value. Next, fill in PostExtraction Lambda function ARN and Amazon CDES3BucketName for CDE as sown from the previous step. You can use the same CDELambdaRolename as you used for PreExtraction Lambda.
- Choose Next.
- Review all the information and choose Add document enrichment.
- Click the Data sources on the right navigation pane and choose the S3 data source and select Edit.
- Add “pre-extraction/” to Exclude Patterns, under Filter Patterns, and click Add. Then add “post-extraction/” to Exclude Patterns too.
- Change the sync mode from Full to New, modified or Deleted content sync, and click Update.
- Navigate back to Amazon Q Application console select the Confluence data source, and choose Sync to start data source sync.
- The data source sync can take up to 10–15 minutes to complete.
- Browse back to the data source page and wait for the sync to complete. Once the sync is finished, the run history will show Completed status.
- Do same for S3 data source. Choose sync to start data source sync for S3
After the index sync is finished, you can click on Amazon Q Business App to test the PreExtraction CDE as below:
- When the data source sync is complete, navigate to Amazon Q Business app console, choose Web Experience Settings, and click on Deployed URL link. Enter the query "Is there any image about hydraulic elevator motor?". The following screenshot shows a sample query response.
- Next, you can also query the technical spec extracted from the image as: "Can you provide the technical spec of the hydraulic elevator motor?"
- To review the foundation model generated image summary, extracted text and table information (if present in the image), you can click in the Amazon S3 bucket, there is a txt file under the folder cde_pre_output folder: Hydraulic_elevator_motor_label.txt. This file contains foundation model generated text of the image file.
- The text file should contain similar information as shown below:
The PostExtraction Lambda function ingests both the document's content and metadata into Amazon Q Business. Let's examine the metadata response of the document.
- Start a new conversation, and enter the query to Amazon Q Business app: What is the category of the hydraulic elevator motor image?. The following screenshot shows a sample query response.
- You can also navigate to S3 bucket to review the output from the PostExtraction function. There is a txt file under the folder cde_post_output named as: Hydraulic-elevator-motor-label.txt. You can download and review it. The file in question begins with a comprehensive metadata description, which was generated utilizing a foundation model. Upon ingestion into an S3 data source, both the image file's metadata and the extracted textual content become integral components of the S3 data source. Subsequently, this information can be efficiently retrieved and utilized through Amazon Q Business.
Conclusion:
In this post, you learned how to leverage Amazon Q Business and Amazon Bedrock to index and search image content within Confluence. By converting image files to searchable text, you've unlocked a powerful capability that significantly enhances information discovery and accessibility in your organization.
This solution addresses a critical challenge in knowledge management, allowing users to query and retrieve information from previously unsearchable image content in real-time. The impact of this can be substantial, potentially boosting productivity, improving decision-making, and fostering a more efficient collaborative environment.
About the Authors:
Joshua Amah is a Partner Solutions Architect with Amazon Web Services. He primarily serves partners, providing architectural guidance and best recommendation for new and existing workloads. Outside of work, he enjoys playing soccer, golf and spending time with family and friends. Feel free to follow Joshua on LinkedIn
Nneoma Okoroafor is a Partner Solutions Architect focused on helping partners follow best practices by conducting technical validations. She specializes in assisting AI/ML and generative AI partners, providing guidance to make sure they’re using the latest technologies and techniques to deliver innovative solutions to customers. You can connect with Nneoma on LinkedIn
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.