Using GenAI to Generate Custom Amazon IVS Live Stream Backgrounds
A step by step guide to empowering your Amazon Interactive Video Service (IVS) live stream with generative AI background images.
Part 1 - Include dependencies and setup MediaPipe Image Segmenter
Part 2 - Setup the Amazon IVS Web Broadcast SDK for live streaming
Part 3 - Setup access to the Stable Diffusion model in Amazon Bedrock
Part 4 - Create a Lambda and API Gateway to send a prompt to the Stable Diffusion model
Part 5 - Prompt your way to a new background image with GenAI
Part 6 - Replace your background with the new image
The duration of the image generation process in the animation above has been shortened to more quickly demonstrate the background replacement effect.
- How to replace the background of a live stream with a custom image using MediaPipe Image Segmenter
- How to live stream yourself with your custom background with Amazon IVS
- How to leverage generative AI to replace the background in your camera feed using Amazon Bedrock
- How to create an AWS Lambda function function to invoke the Stable Diffusion model via Amazon Bedrock
About | |
---|---|
✅ AWS Level | Intermediate - 200 |
⏱ Time to complete | 60 minutes |
💰 Cost to complete | Free when using the AWS Free Tier |
🧩 Prerequisites | - AWS Account |
💻 Code Sample | GitHub |
📢 Feedback | Any feedback, issues, or just a 👍 / 👎 ? |
⏰ Last Updated | 2024-01-12 |
- Part 1 - Include dependencies and setup MediaPipe Image Segmenter
- Part 2 - Setup the Amazon IVS Web Broadcast SDK for live streaming
- Part 3 - Setup access to the Stable Diffusion model in Amazon Bedrock
- Part 4 - Create a Lambda and API Gateway to send a prompt to the Stable Diffusion model
- Part 5 - Prompt our way to a new background image with GenAI
- Part 6 - Replace our background with the new image
- Part 7 - Test our live stream
@mediapipe/tasks-vision
. We will also install Webpack so that we can later bundle our JavaScript.index.html
file. We will also take this opportunity to include the IVS real-time streaming Web Broadcast SDK for live streaming. Replace <SDK Version>
with the latest version number.Note: Extraneous HTML attributes have been removed from all HTML code snippets in this article for readability. Refer to the Github repo for the full solution.
<video>
element which will contain your live camera feed and will be used as input to the MediaPipe Image Segmenter. Also create a <canvas>
element that will be used to render a preview of the feed that will be broadcast. You will also need to create a second <canvas>
element that will be used to render the custom image provided by Stable Diffusion that will be used as our background. Since the second canvas with the custom image is used only as a source to programmatically copy pixels from it to the final canvas, it is hidden from view.</body>
tag to load a bundled JavaScript file that will contain the code to do the background replacement and publish it to a stage:app.js
to get the element objects for the canvas and video elements that were created in the HTML page. Additionally, get the elements for our background change controls and model for later usage. To set up live streaming later with Amazon IVS, import Stage
and LocalStageStream
from IVSBroadcastClient
. Additionally, import the ImageSegmenter
and FilesetResolver
modules. The ImageSegmenter
module will be used to perform the segmentation task.ImageSegmenter
using an async
function, The ImageSegmenter
will segment the image and return the result as a mask. When creating an instance of an ImageSegmenter
, we will use the selfie segmentation model. This model is ideal for telling us which pixels in the image are in the foreground vs the background.createImageSegmenter()
function one line at a time. This function creates an image segmentation model using MediaPipe. First, it uses FilesetResolver
to load the WebAssembly (WASM) module for vision tasks from the MediaPipe NPM package. Using a WASM module allows computationally intensive tasks like image segmentation inference to be run directly in the browser without installing additional packages. It then calls ImageSegmenter.createFromOptions
to initialize a new segmenter model.ImageSegmenter.createFromOptions
include:baseOptions
, which specifies the model asset path (a TensorFlow Lite model hosted on Google Storage) and sets the delegate to use the GPU.runningMode
, which sets the model to operate on video frames.outputCategoryMask
, which tells the model to output a category mask instead of just bounding boxes. The category mask indicates whether a given pixel in the image is more likely to be in the foreground or the background.
Promise
that resolves with the initialized ImageSegmenter
object once loading is complete. This will allow us to asynchronously initialize the model without blocking execution of the rest of the JavaScript code.- Stage: A virtual space where participants exchange audio or video. The Stage class is the main point of interaction between the host application and the SDK.
- StageStrategy: An interface that provides a way for the host application to communicate the desired state of the stage to the SDK
- Events: You can use an instance of a stage to communicate state changes such as when someone leaves or joins it, among other events.
init()
to retrieve a MediaStream
from the user’s camera. We will later be calling this function and passing in the URL to an image from Stable Diffusion XL. As a default, let’s use an image of a beach.MediaStream
from our first canvas element and assign it to segmentationStream
. For now, this MediaStream
will just contain our camera feed. Later on, we will add logic to replace the background with a custom image.MediaStream
we want to publish to an audience, we need to join a stage. Joining a stage enables us to live stream the video feed to the audience or other participants in the stage. If we don’t want to live stream anymore, we can leave the stage. Let’s add event listeners that listen for click events when an end user clicks the join or leave stage buttons and implement the appropriate logic.MediaStream
from the local camera to our video element. Additionally, we also invoke a custom callback function every time a camera frame is loaded, which we will name renderVideoToCanvas
. Later on in this article, we will implement this function and explain it in detail.init
function now looks as follows.joinStage
function. In this function, we’re going to get the MediaStream
from the user’s microphone so that we can publish it to the stage. Publishing is the act of sending audio and/or video to the stage so other participants can see or hear the participant that has joined. We also need to implement the StageStrategy
interface by defining the shouldSubscribeToParticipant
, shouldPublishParticipant
, and stageStreamsToPublish
functions.stageStreamsToPublish
function. This function is used to determine what audio and video streams to publish. To do that, it returns an array of LocalStageStream
instances. Using the MediaStream
instances from the microphone and the canvas, assigned to segmentationStream
, we can create instances of a LocalStageStream
. Then, all we need to do in the stageStreamsToPublish
function is return the instances of LocalStageStream
we just created in an array. This will enable the audience to hear our audio and see our video.shouldPublishParticipant
function by simply returning true. This indicates whether you, the local participant, should publish your media streams.shouldSubscribeToParticipant
function. This function indicates whether our app should subscribe to a remote participant’s audio only, audio and video, or nothing at all when they join the stage. We want both audio and video so we return SubscribeType.AUDIO_VIDEO
.joinStage
function, create a new Stage object passing in the participant token and strategy object we just set up as arguments. The participant token is used to authenticate with the stage as well as identify which stage we are joining. You can get a participant token by creating a stage in the console and subsequently creating a participant token within that stage using either the console or AWS SDK. Later on, we will call the join method on a stage object to join a stage.Boto3
, that comes with Lambda does not include the Bedrock runtime. If you get the error "Unknown service: 'bedrock-runtime'"
when you invoke this Lambda, follow these instructions to create a layer that uses the latest version of Boto3
. You will then need to add this new layer Lambda’s function configuration. Once you do that, this Lambda then returns a JSON response that looks similar to the following.Note, the base64 encoded string is quite long and has been truncated in the following example for illustrative purposes.
<canvas>
element we created earlier. We then pass the base64 encoded string to the initBackgroundCanvas
function to do this, which we will explain next.initBackgroundCanvas
function is defined as follows, which is also added to app.js
. This function simply renders our generated image directly to the canvas. We do this so that we can use the Canvas API to copy the individual pixels from the <canvas>
element to do the background replacement.renderVideoToCanvas
function we mentioned earlier. This function renders the video feed to the second canvas element in our HTML. We need to render the video feed to a canvas so we can extract the foreground pixels from it using the Canvas 2D API. While doing this, we also will pass a video frame to our instance of ImageSegmenter
, using the segmentforVideo
method to segment the foreground from the background in the video frame. When the segmentforVideo
method returns, it invokes our custom callback function, replaceBackground
, for doing the background replacement.replaceBackground
function, which merges the custom background image with the foreground from the camera feed to replace the background. The function first retrieves the underlying pixel data of the custom background image and the video feed from the two canvas elements created earlier. It then iterates through the mask provided by ImageSegmenter
, which indicates which pixels are in the foreground. As it iterates through the mask, it selectively copies pixels that contain the user’s camera feed to the corresponding background pixel data. Once that is done, it converts the final pixel data with the foreground copied on to the background and draws it to a canvas.webpack.config.js
as followspackage.json
file to run webpack as your JavaScript bundler when you run the build script. Specify webpack.config.js
as your Webpack config file.import
statement within app.js
without any issues.index.html
and open localhost:8000
to see the result. You should see a local camera feed with a new background image. Alternatively, we can also consider using http-server
as our HTTP server.Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.