logo
Menu
How I automated social media image creation with Gen AI (kind of)

How I automated social media image creation with Gen AI (kind of)

Automate social media image creation with Gen AI

Published May 29, 2024
Since I am super duper adult nowadays, I often find myself scrolling through LinkedIn during the day. One interesting trend I've noticed is the growing use of AI to create images for posts. It's a smart approach, indeed. However, I also noticed that there isn't a way to automatically generate an image while writing a post. Which gave me the idea that this could be something fun to build. And yes the steps to manually do this is not very complicated but it is always nice to automate stuff.
I decided that I wanted to do a POC for this and the only thing left to do was to come up with a plan on how to do it... And then actually do it.

From idea to plan. What would I need to create this?

There are obviously multiple ways to do this. So I had to figure out which way I wanted to do it.

Frontend

First of all, I needed a frontend to visualize the application and provide a user-friendly interface. Since I'm already familiar with React, I decided to use it to build a simple and effective frontend, allowing me to focus more on the AWS integration, which is the core of this project. Here's how I structured the frontend:
  • React with TypeScript: For type safety and better code quality.
    • React Query and Axios: For handling API requests.
    • Ant Design (Antd): A component library, It's always fun to test something new, and Antd seemed pretty nice.

AWS integration

For the AWS part of the project, where I wanted to put my main focus, I wanted to create a robust and maintainable application. After some consideration, I decided to use the following setup:
  • AWS SAM: Decided to use SAM to define the infrastructure as code.
  • Api-Gateway: Serves as the entry point for the backend, managing and routing incoming requests. It is integrated with eventBridge to decouple the application components.
  • EventBridge: Picks up the events from API Gateway and forwards them to the right part of the application (the step function).
  • Step Functions: Picks up events from eventBridge and orchestrates the workflow to process the request from the frontend, to make it clear what is happening.
  • Lambda (Written in GO): Acts as individual steps within the step functions workflow, handling specific tasks such as preparing prompts and generating images.
  • Comprehend: Analyzes the sentiment of the post when preparing the prompt for image generation. To provide contextual understanding which ultimately should enhance the quality of the generated image.
  • Bedrock and DALL-E: Used for creating images based on the analyzed social media posts, integrated within the Lambda functions to generate the images.
I started off thinking I would use DALL-E to create my images, since that seemed to be what people where using mostly when generating images. During the development I realized that it would be cool to compare DALL-E with AWS Bedrock so I added some functionality to choose between DALL-E and Bedrock when generating an image. Because why do one thing when you can complicate it and do two...
What my plan ended up to be, visualised:
diagram
Diagram

Prerequisites and setup

  • Frontend: Node and TypeScript
    • Component Library: Ant Design (Antd) npm i antd
    • Packages used: React Query, Axios npm i react-query axios
  • DALL-E: A developer account at openAI to be able to generate and retrieve a API-key.

How I built it - Disclaimer

Since this is a POC and not a production ready application that I will build. I have not implemented anything from a security perspective. This is just for testing purposes and will never be deployed to any production environment. If it where to be used in production, authentication would need to be added for example.

How I built it: Backend

To start things off I initialized a new sam project using sam init in a freshly created folder. For the options I choose to start from a template. Specifically the Hello World example template. For language I choose go (provided.al2023).
Now when I had the project initialized with the template.yaml file inside of it the first step was to add the required infrastructure.
To start of on a blank canvas I deleted everything inside of it and replaced it with the following:

How I built it: Backend - EventBridge

The first step was to define an eventBus for eventBridge, which would handle events sent from the API Gateway. Additionally, I needed to grant eventBridge permission to initiate the execution of my step function, which is defined later. An eventBridge rule was also required to specify where to route the events.

How I built it: Backend - API

To have a entrypoint into AWS from my frontend I needed to add a API. I spent some time researching on how to integrate eventBridge with Api Gateway and found this openAPI definition. With that example in mind I created a new file called api.yaml where I stored the definition of my API. The API had to have two routes.
  • One for kicking of the process of creating a image
  • One that we could use to poll our backend for the generated image once it had been created.
In the template.yaml file I added the following to define my API and the required roles needed.
The definition of the API looked like this:

How I built it: Backend - S3

When the images had been created I needed somewhere to store them. For that I defined a s3 bucket and its bucket policy as following:

How I built it: Backend - Step function

To structure my lambda functions and ensure they executed in order and when they where supposed to I created a step function with two possible routes. One route for creating a DALL-E image and the other route to create a Bedrock Image. I defined the step function and role like this:

How I built it: Backend - Lambda function SAM definitions

When the step function was in place it was time to define the lambda functions that was trigger by it.
  • The prompt preparation function needed permission to detect sentiment and key phrases from a text and therefore also needed permission AWS comprehend.
  • The DALL-E image generation function did not need any special permissions since it sent a request to openAI that is not an AWS service. Although it needed an environment variable for the API key that will be used by the lambda to authorize the request to DALL-E.
  • The Bedrock image generation function needed to be able to invoke bedrock and therefore needed permissions to AWS Bedrock.
  • The image upload function did not need any special permissions since the S3 bucket was already public.
  • The last function that was needed was the function used to get the image integrated with Api Gateway. No special permissions needed since the S3 bucket was public. One environment variable was needed to know which bucket to upload to.

How I built it: Backend - Lambda functions GO implementation

When all the infrastructure was defined I needed to implement the logic in the lambda functions. To see the complete folder structure of each lambda, have a look in my github repository Repository. Here you can also see examples of the files needed to build the project using sam build.
To explain some of the logic I have provided comments in the functions.
Prepare prompt function
In this Lambda function, I process posts (text) sent by the frontend using Comprehend to analyze the sentiment. When triggered with a post and an S3 key, the following should happen:
  • Comprehend client is initialized to analyze input text sentiment.
  • Comprehend detects the sentiment of the text.
  • The top sentiment is decided and converted to the responding emotion.
  • The key phrases of the text it extracted and converted to a string.
  • The prompt is prepared and sent to the next function.
Generate image DALL-E function
In this Lambda function, I process prompts sent by the previous function and generate images using the DALL-E API. When triggered with a prompt and an S3 key, the following should happen:
  • Retrieve the OpenAI API key from the environment variables.
  • Create a request body for the DALL-E API using the provided prompt.
  • Send an request to the DALL-E API to generate an image.
  • Read and parse the response from the DALL-E API.
  • Extract the image URL from the response.
  • Send the image URL and S3 key to the next function.
Generate image Bedrock function
In this Lambda function, I process text prompts to generate images using Amazon's Titan model and store these images in an S3 bucket. When triggered with a text prompt and an S3 key, the following should happen:
  • Reading the S3 bucket name from an environment variable and loading the AWS configuration.
  • Preparing the payload that wraps the text prompt into parameters required for the image generation task and specifying the image dimensions etc.
  • Invoking the Titan image generation model with the prepared payload.
  • The Titan model processes the prompt and returns a base64 encoded image
  • Decode the base64 encoded image into a byte array.
  • Upload to s3
Upload image function
In this Lambda function, I process URLs sent by the previous function to download images and upload them to an S3 bucket. When triggered with an image URL and an S3 key, the following should happen:
  • Retrieve the bucket name and region from environment variables.
  • Download the image from the provided URL.
  • Read the image data from the HTTP response.
  • Initialize an S3 client.
  • Upload the image data to the specified S3 bucket using the provided S3 key.
Get image function
In this Lambda function, I handle API Gateway requests to check the existence of an object in an S3 bucket and return its URL. When triggered with a request, the following should happen:
  • Retrieve the bucket name from the environment variables.
  • Extract the s3Key query parameter from the request.
  • Initialize an S3 client.
  • Check if the object exists in the S3 bucket using the s3Key.
  • If the object exists, return a response with the object's URL.
  • If the object does not exist or there is an error, return a error message.

How I built it: Frontend

For the frontend I wanted to do it simple and used a pre-built component library and some "out of the box" great tools for the requests.
To create the frontend I initialized a new React project with typescript using npx create-react-app my-app --template typescript in a new folder.
I created a folder that I named components in which I added another folder named mainContent container a file named index.tsx. Since the focus was not on the frontend for this project I decided to have all my content in the same component. This component looked like this:
In this component I handled almost everything. The only thing I outsourced was the functions for the requests. Those were added to a file which I named api.ts inside a folder called utils. The functions for the requests looked like this:
In this React Component the following happens:
  • Display a layout with a header and content section.
  • Allow the user to write a post in a text area.
  • Providing a switch for the user to choose whether to use the Bedrock or DALL-E by default.
  • When the "Generate" button is clicked, send the post data to the backend service to generate an image.
  • Constantly poll for the generated image from the backend and display it once available.
  • When the user requests for the image a loader is shown until the image is retrieved.
To see the full frontend implementation. Have a look at my repository
The frontend turned out to look like this:
frontend
frontend

Result and summary

Did it work? Yes it did!
With DALL-E
dalle
dalle
With Bedrock
bedrock
bedrock
From the tests I have been doing in this POC I have to give the win to DALL-E. It just better understands the prompts and creates better images currently. While my familiarity with DALL-E might have contributed to these results, as I am less experienced with Bedrock, the difference in performance was still notable. Testing them both was fun, and I found the comparison between the two worth it.
Some improvements can of course be done. For example I could have skipped the whole polling part and instead created a websocket API. For this POC i decided not to do that. But who is to say that wont happen in a future blog post...
Other future improvements would be to revisit the prepare prompt function and make it a bit more advanced. One thing could be to add the possibility for the user to choose what style of image they want generated, should it be realistic or illustrated?
All in all, I found this project to be fun and I got to play around with Gen AI a bit which was super cool :)
A final note is that these AI tools are expensive so be aware of costs!

Deep Dive

If you want to have a closer look at what has been used during this project. Here are some links:

Comments