logo
Menu

Image Text Validation using Amazon Rekognition and Bedrock

This post provides details of a serverless image text validation solution built using AWS AI services and Bedrock.

Published Mar 5, 2024

Introduction

In this article, we aim to address a use case: validating the pictures of restaurant hours of operation taken by delivery drivers to ascertain whether the restaurant is closed or not. Given the wide variety of store operating hour signs, it can be challenging for a system to perform such validation. The solution leverages Amazon Rekognition to detect the text in the pictures and the cognitive abilities of large language models (LLM), readily available on Amazon Bedrock.Architecture Overview
Architecture Diagram
The diagram above provides the high level overview of the end to end architecture powering this use case.
  1. A mobile client app uploads the image along with metadata (including restaurant name, driver id, driver name, timestamp when picture was taken) to an S3 bucket.
  2. The picture upload triggers an event, which in turn invokes a Lambda function.
  3. The Lambda function initiates the orchestration flow, first invoking detectText api in Amazon Rekognition.
  4. The api response contains the text, in this case, restaurant hours of operations.
  5. Next, the Lambda function invokes an LLM, passing it the extracted text in step #4 and the metadata containing the time when the picture was taken.
  6. The LLM processes the information provided in the prompt and, in response, informs whether the restaurant is open or closed along with the reasoning.
  7. Finally, the LLM response along with the metadata is stored in Amazon DynamoDB table for analysis.
NOTE - The code sample is available in the amazon-bedrock-samples git repository here.

Prompt Engineering Technique

We used the one-shot prompting technique to prompt the LLM to provide accurate responses. One-shot prompting is a machine learning technique that uses a single example to guide the model's output. Here is an example of the one-shot prompt with Anthropic Claude's v2 model used in this solution.

What happens when the image is invalid?

If the text in the image is illegible or unclear, the Language Learning Model (LLM) will indicate its inability to determine whether the restaurant is open or closed due to insufficient information. This feature can aid in identifying potential fraudulent situations when an image is posted by an untrustworthy individual.

Conclusion

In this post, we presented a demonstration of a sample solution for validating text, specifically focusing on restaurant operating hours. Nonetheless, this architectural approach can be adapted for various scenarios where there's a need for extracting and validating text from images using Amazon Rekognition and Bedrock.
 
This post have been co-authored with Tony Howell (anhwell)
 

Comments