logo
Menu
#TGIFun🎈 Building GenAI apps with managed AI services

#TGIFun🎈 Building GenAI apps with managed AI services

Some random thoughts on managed AI services and their place in the GenAI stack... with examples.

João Galego
Amazon Employee
Published Apr 19, 2024
Last Modified May 22, 2024

Overview

In this 2nd episode of #TGIFun🎈, I'd like to share some thoughts on managed AI services and their place in the GenAI application stack.
Now that Foundation Models (FM) can write novels, compose full musical scores, edit movies, generate code and much, much more... is there still a place for traditional AI/ML services? This is one of the most asked questions when I talk with customers and my answer is always an emphatic ✅ 𝒀𝒆𝒔!
I believe there is still a lot of room for AI services like Amazon Polly (text-to-speech, TTS) or Amazon Translate (machine translation) and, yes, even good old Amazon Lex (conversational AI). And just to prove that, I'm going to show you 3 different applications that combine GenAI services like Amazon Bedrock (FMaaS) with traditional AI services from the top of our AI/ML stack.
💡 Did you know? You can try some of these services for *free* on the AWS AI Services Demo website!

Describe for Me 🖼️💬

The 2022 AWS re:Invent conference in Las Vegas brought us a lot of great things in the ML space, but this one may have flown under your radar. Presented at the AWS Builders' Fair, Describe for Me is an “Image to Speech” app created to help the visually impaired understand images through captions, face recognition, and TTS.
As we can see in the diagram below, it uses an AI service combo that includes Amazon Textract (data extraction), Amazon Rekognition (computer vision) and Amazon SageMaker (E2E ML platform) to do the captioning, while Polly reads the result back to the user in a clear, natural-sounding voice.
DescribeForMe reference architecture
The original app is powered by OFA (One For All), a unified seq2seq model available on Hugging Face. We can do one better and replace the whole captioning sequence with a single call to a multimodal model like Claude 3 Sonnet, which is available on Amazon Bedrock, and replicate the whole application in 100 lines of Python code (also available as a gist).
Let's put our script to the test and generate an audio description in European Portuguese (pt-PT) spoken by Polly's one and only Inês of Seurat's A Sunday Afternoon on the Island of La Grande Jatte:
Georges Seurat (1884–1886), A Sunday Afternoon on the Island of La Grande Jatte
If you're curious, here's the final result:
🎯 Try it out for yourself and share the best descriptions in the comments section below 👇

QnA Bot feat. Amazon Lex 🗣️

This one comes from a post published last year in the AWS ML Blog. The solution integrates Amazon Lex (I promised it would make an appearance) with an open-source LLM (FLAN-T5 XL) available through Amazon SageMaker JumpStart.
QnABot architecture
In a nutshell, Amazon Lex handles the basic stuff, where the user's intent is clear, while the model handles the tough ones, the "don't know" answers, via Lambda functions.
💡 If you want to know more about how Amazon Lex integrates with Lambda functions, read the section on Enabling custom logic with AWS Lambda functions in the Amazon Lex V2 Developer Guide.
QnA bot sequence diagram
All the code and documentation for this solution is available in this GitHub repository, so go check it out.

MEH! 😒🐑 My Expert Helper

Finally, let's build an application from scratch. We could aim towards the whimsical goal of using as many AI services as possible, but let's leave that for another post.
For now, I will settle for a simple conversational app powered by LangChain and Streamlit. This kind of application is so mundane nowadays that I gave it an interjection instead of a name (MEH!) then asked Claude to make it an acronym for something (talk about lazy). Oh well...
Fun fact: An earlier version was known as YEAH! (Your Excellent Artificial Helper) but it wasn't so well received during private showings.
So here's what the app is supposed to do:
  1. Takes in a user prompt
  2. Translates it to a target language using Amazon Translate
  3. Sends it to Anthropic's Claude on Amazon Bedrock
  4. Translates the response back to the source language
  5. Turns the response into speech via Amazon Polly
If this is not enough, here's a sequence diagram telling you the exact same thing:
The full code is available in GitHub, so feel free to play around.
💡 This app uses Boto3, the AWS SDK for Python, to call AWS services. You must configure both AWS credentials and an AWS Region in order to make requests. For information on how to do this, see AWS Boto3 documentation (Developer Guide > Credentials).
You can start it directly with Streamlit
or make a container out of it
and then run it
Linux
Windows (WSL2)
Thank you so much for reading this far and have a nice weekend! 👋
This is the second article in the #TGIFun🎈 series, a personal space where I'll be sharing some small, hobby-oriented projects with a wide variety of applications. As the name suggests, new articles come out on Friday. // PS: If you like this format, don't forget to give it a thumbs up 👍 As always: work hard, have fun, make history!

References 📚

Blogs

Code

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments