logo
Menu
How I automated social media image creation with Gen AI (kind of)

How I automated social media image creation with Gen AI (kind of)

Automate social media image creation with Gen AI

Published May 29, 2024
Since I am super duper adult nowadays, I often find myself scrolling through LinkedIn during the day. One interesting trend I've noticed is the growing use of AI to create images for posts. It's a smart approach, indeed. However, I also noticed that there isn't a way to automatically generate an image while writing a post. Which gave me the idea that this could be something fun to build. And yes the steps to manually do this is not very complicated but it is always nice to automate stuff.
I decided that I wanted to do a POC for this and the only thing left to do was to come up with a plan on how to do it... And then actually do it.

From idea to plan. What would I need to create this?

There are obviously multiple ways to do this. So I had to figure out which way I wanted to do it.

Frontend

First of all, I needed a frontend to visualize the application and provide a user-friendly interface. Since I'm already familiar with React, I decided to use it to build a simple and effective frontend, allowing me to focus more on the AWS integration, which is the core of this project. Here's how I structured the frontend:
  • React with TypeScript: For type safety and better code quality.
    • React Query and Axios: For handling API requests.
    • Ant Design (Antd): A component library, It's always fun to test something new, and Antd seemed pretty nice.

AWS integration

For the AWS part of the project, where I wanted to put my main focus, I wanted to create a robust and maintainable application. After some consideration, I decided to use the following setup:
  • AWS SAM: Decided to use SAM to define the infrastructure as code.
  • Api-Gateway: Serves as the entry point for the backend, managing and routing incoming requests. It is integrated with eventBridge to decouple the application components.
  • EventBridge: Picks up the events from API Gateway and forwards them to the right part of the application (the step function).
  • Step Functions: Picks up events from eventBridge and orchestrates the workflow to process the request from the frontend, to make it clear what is happening.
  • Lambda (Written in GO): Acts as individual steps within the step functions workflow, handling specific tasks such as preparing prompts and generating images.
  • Comprehend: Analyzes the sentiment of the post when preparing the prompt for image generation. To provide contextual understanding which ultimately should enhance the quality of the generated image.
  • Bedrock and DALL-E: Used for creating images based on the analyzed social media posts, integrated within the Lambda functions to generate the images.
I started off thinking I would use DALL-E to create my images, since that seemed to be what people where using mostly when generating images. During the development I realized that it would be cool to compare DALL-E with AWS Bedrock so I added some functionality to choose between DALL-E and Bedrock when generating an image. Because why do one thing when you can complicate it and do two...
What my plan ended up to be, visualised:
diagram
Diagram

Prerequisites and setup

  • Frontend: Node and TypeScript
    • Component Library: Ant Design (Antd) npm i antd
    • Packages used: React Query, Axios npm i react-query axios
  • DALL-E: A developer account at openAI to be able to generate and retrieve a API-key.

How I built it - Disclaimer

Since this is a POC and not a production ready application that I will build. I have not implemented anything from a security perspective. This is just for testing purposes and will never be deployed to any production environment. If it where to be used in production, authentication would need to be added for example.

How I built it: Backend

To start things off I initialized a new sam project using sam init in a freshly created folder. For the options I choose to start from a template. Specifically the Hello World example template. For language I choose go (provided.al2023).
Now when I had the project initialized with the template.yaml file inside of it the first step was to add the required infrastructure.
To start of on a blank canvas I deleted everything inside of it and replaced it with the following:
1
2
3
4
5
6
AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31

Resources:

Outputs:

How I built it: Backend - EventBridge

The first step was to define an eventBus for eventBridge, which would handle events sent from the API Gateway. Additionally, I needed to grant eventBridge permission to initiate the execution of my step function, which is defined later. An eventBridge rule was also required to specify where to route the events.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
PostToImageEventBus:
Type: AWS::Events::EventBus
Properties:
Name: PostToImageEventBus

StateMachineEventBridgeRule:
Type: AWS::Events::Rule
Properties:
EventBusName: !Ref PostToImageEventBus
EventPattern:
source:
- "api-gateway"
detail-type:
- "PreparePrompt"
Targets:
- Arn: !GetAtt PostToImageStateMachine.Arn
Id: "PostToImageStateMachineTarget"
RoleArn: !GetAtt EventBridgeExecutionRole.Arn
State: "ENABLED"

EventBridgeExecutionRole:
Type: "AWS::IAM::Role"
Properties:
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Principal:
Service: "events.amazonaws.com"
Action:
- "sts:AssumeRole"
Policies:
- PolicyName: "EventBridgeStepFunctionsExecutionPolicy"
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- "states:StartExecution"
Resource: !GetAtt PostToImageStateMachine.Arn

How I built it: Backend - API

To have a entrypoint into AWS from my frontend I needed to add a API. I spent some time researching on how to integrate eventBridge with Api Gateway and found this openAPI definition. With that example in mind I created a new file called api.yaml where I stored the definition of my API. The API had to have two routes.
  • One for kicking of the process of creating a image
  • One that we could use to poll our backend for the generated image once it had been created.
In the template.yaml file I added the following to define my API and the required roles needed.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
PostToImageApi:
Type: AWS::Serverless::HttpApi
Properties:
DefinitionBody:
"Fn::Transform":
Name: "AWS::Include"
Parameters:
Location: "./api.yaml"

# Permission to put events to eventBridge
HttpApiEvenbridgeRole:
Type: "AWS::IAM::Role"
Properties:
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: "Allow"
Principal:
Service: "apigateway.amazonaws.com"
Action:
- "sts:AssumeRole"
Policies:
- PolicyName: ApiDirectWriteEventBridge
PolicyDocument:
Version: "2012-10-17"
Statement:
Action:
- events:PutEvents
Effect: Allow
Resource:
- !GetAtt PostToImageEventBus.Arn

# Permission to invoke the GetImage Lambda
HttpApiLambdaRole:
Type: "AWS::IAM::Role"
Properties:
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: "Allow"
Principal:
Service: "apigateway.amazonaws.com"
Action:
- "sts:AssumeRole"
Policies:
- PolicyName: "LambdaExecutionPolicy"
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- "lambda:InvokeFunction"
Resource: !GetAtt GetImageFunction.Arn
The definition of the API looked like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
openapi: "3.0.1"
info:
title: "HTTP API"
paths:
/get-image:
get:
parameters:
- name: s3Key
in: query
description: "Path to image in S3 bucket"
required: true
schema:
type: string
responses:
200:
description: "Successful response"
content:
application/json:
schema:
type: object
properties:
url:
type: string
description: "URL of the retrieved image"
x-amazon-apigateway-integration:
type: "aws_proxy"
httpMethod: "POST"
uri:
Fn::Sub: arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${GetImageFunction.Arn}/invocations
credentials:
Fn::GetAtt: [HttpApiLambdaRole, Arn]
payloadFormatVersion: "1.0"
passthroughBehavior: "when_no_match"
/generate-image:
post:
responses:
default:
description: "API to EventBridge"
x-amazon-apigateway-integration:
integrationSubtype: "EventBridge-PutEvents"
credentials:
Fn::GetAtt: [HttpApiEvenbridgeRole, Arn]
requestParameters:
Detail: "$request.body"
DetailType: PreparePrompt
Source: api-gateway
EventBusName:
Fn::GetAtt: [PostToImageEventBus, Name]
payloadFormatVersion: "1.0"
type: "aws_proxy"
connectionType: "INTERNET"
x-amazon-apigateway-importexport-version: "1.0"
x-amazon-apigateway-cors:
allowOrigins:
- "*"
allowHeaders:
- "*"
allowMethods:
- "PUT"
- "POST"
- "DELETE"
- "HEAD"
- "GET"

How I built it: Backend - S3

When the images had been created I needed somewhere to store them. For that I defined a s3 bucket and its bucket policy as following:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
ImageUploadBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: !Sub "${AWS::StackName}-image-upload-bucket"
PublicAccessBlockConfiguration:
BlockPublicAcls: false
BlockPublicPolicy: false
IgnorePublicAcls: false
RestrictPublicBuckets: false
CorsConfiguration:
CorsRules:
- AllowedHeaders:
- "*"
AllowedMethods:
- PUT
- POST
- DELETE
- HEAD
- GET
AllowedOrigins:
- "*"
ExposedHeaders: []

# This makes the bucket available for everyone to do PUT and GET
ImageUploadBucketPolicy:
Type: AWS::S3::BucketPolicy
Properties:
Bucket: !Ref ImageUploadBucket
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Principal: "*"
Action:
- s3:PutObject
- s3:GetObject
Resource: !Sub "arn:aws:s3:::${AWS::StackName}-image-upload-bucket/*"

How I built it: Backend - Step function

To structure my lambda functions and ensure they executed in order and when they where supposed to I created a step function with two possible routes. One route for creating a DALL-E image and the other route to create a Bedrock Image. I defined the step function and role like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
PostToImageStateMachine:
Type: "AWS::Serverless::StateMachine"
Properties:
Definition:
StartAt: PromptPreparation
States:
PromptPreparation:
Type: Task
Resource: !GetAtt PromptPreparationAPIFunction.Arn
Next: CheckBedRock
# Choice to see which route to take
CheckBedRock:
Type: Choice
Choices:
- Variable: "$.bedrock"
BooleanEquals: true
Next: BedrockImageGeneration
Default: DalleImageGeneration
BedrockImageGeneration:
Type: Task
Resource: !GetAtt BedrockImageGenerationFunction.Arn
End: true
DalleImageGeneration:
Type: Task
Resource: !GetAtt DalleImageGenerationFunction.Arn
Next: ImageUpload
ImageUpload:
Type: Task
Resource: !GetAtt ImageUploadFunction.Arn
End: true
Role: !GetAtt PostToImageStateMachineRole.Arn

# Gives permission to the step function to invoke the required lambda functions
# And permission to handle the actual step function
PostToImageStateMachineRole:
Type: "AWS::IAM::Role"
Properties:
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Principal:
Service:
- states.amazonaws.com
Action:
- "sts:AssumeRole"
Policies:
- PolicyName: "StateMachineExecutionPolicy"
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- "lambda:InvokeFunction"
Resource:
- !GetAtt PromptPreparationAPIFunction.Arn
- !GetAtt DalleImageGenerationFunction.Arn
- !GetAtt ImageUploadFunction.Arn
- !GetAtt BedrockImageGenerationFunction.Arn
- Effect: Allow
Action:
- "states:StartExecution"
- "states:DescribeExecution"
- "states:StopExecution"
Resource: "*"

How I built it: Backend - Lambda function SAM definitions

When the step function was in place it was time to define the lambda functions that was trigger by it.
  • The prompt preparation function needed permission to detect sentiment and key phrases from a text and therefore also needed permission AWS comprehend.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
PromptPreparationAPIFunction:
Type: AWS::Serverless::Function
Properties:
Handler: bootstrap
Runtime: provided.al2023
CodeUri: prepare-prompt/
Timeout: 30
Policies:
- AWSLambdaBasicExecutionRole
- Version: "2012-10-17"
Statement:
- Effect: "Allow"
Action:
- comprehend:DetectSentiment
- comprehend:DetectKeyPhrases
Resource: "*"
  • The DALL-E image generation function did not need any special permissions since it sent a request to openAI that is not an AWS service. Although it needed an environment variable for the API key that will be used by the lambda to authorize the request to DALL-E.
1
2
3
4
5
6
7
8
9
10
11
12
DalleImageGenerationFunction:
Type: AWS::Serverless::Function
Properties:
Handler: bootstrap
Runtime: provided.al2023
CodeUri: generate-image-dalle/
Timeout: 30
Environment:
Variables:
OPENAI_API_KEY: !Ref OpenAIApiKey
Policies:
- AWSLambdaBasicExecutionRole
  • The Bedrock image generation function needed to be able to invoke bedrock and therefore needed permissions to AWS Bedrock.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
BedrockImageGenerationFunction:
Type: AWS::Serverless::Function
Properties:
Handler: bootstrap
Runtime: provided.al2023
CodeUri: generate-image-bedrock/
Timeout: 30
Environment:
Variables:
BUCKET_NAME: !Ref ImageUploadBucket
Policies:
- AWSLambdaBasicExecutionRole
- Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- "bedrock:InvokeModel"
Resource: "*"
  • The image upload function did not need any special permissions since the S3 bucket was already public.
1
2
3
4
5
6
7
8
9
10
11
12
13
ImageUploadFunction:
Type: AWS::Serverless::Function
Properties:
Handler: bootstrap
Runtime: provided.al2023
CodeUri: upload-image/
Timeout: 30
Environment:
Variables:
BUCKET_NAME: !Ref ImageUploadBucket
REGION: !Ref AWS::Region
Policies:
- AWSLambdaBasicExecutionRole
  • The last function that was needed was the function used to get the image integrated with Api Gateway. No special permissions needed since the S3 bucket was public. One environment variable was needed to know which bucket to upload to.
1
2
3
4
5
6
7
8
9
10
11
12
GetImageFunction:
Type: AWS::Serverless::Function
Properties:
Handler: bootstrap
Runtime: provided.al2023
CodeUri: get-image/
Timeout: 30
Environment:
Variables:
BUCKET_NAME: !Ref ImageUploadBucket
Policies:
- AWSLambdaBasicExecutionRole

How I built it: Backend - Lambda functions GO implementation

When all the infrastructure was defined I needed to implement the logic in the lambda functions. To see the complete folder structure of each lambda, have a look in my github repository Repository. Here you can also see examples of the files needed to build the project using sam build.
To explain some of the logic I have provided comments in the functions.
Prepare prompt function
In this Lambda function, I process posts (text) sent by the frontend using Comprehend to analyze the sentiment. When triggered with a post and an S3 key, the following should happen:
  • Comprehend client is initialized to analyze input text sentiment.
  • Comprehend detects the sentiment of the text.
  • The top sentiment is decided and converted to the responding emotion.
  • The key phrases of the text it extracted and converted to a string.
  • The prompt is prepared and sent to the next function.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
package main

import (
"context"
"fmt"
"log"
"strings"

"github.com/aws/aws-lambda-go/lambda"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/comprehend"
)

type InputEvent struct {
Detail struct {
Text string `json:"text"`
S3Key string `json:"s3Key"`
Bedrock bool `json:"bedrock"`
} `json:"detail"`
}

type OutputEvent struct {
Prompt string `json:"prompt"`
S3Key string `json:"s3Key"`
Bedrock bool `json:"bedrock"`
}

// function to get the top sentiment and convert it for the prompt
func getTopSentiment(sentimentScores *comprehend.SentimentScore) string {
sentiments := map[string]float64{
"Positive": *sentimentScores.Positive,
"Negative": *sentimentScores.Negative,
"Neutral": *sentimentScores.Neutral,
"Mixed": *sentimentScores.Mixed,
}

var topSentiment string
var maxScore float64

for sentiment, score := range sentiments {
if score > maxScore {
maxScore = score
topSentiment = sentiment
}
}

switch topSentiment {
case "Positive":
topSentiment = "happy"
case "Negative":
topSentiment = "sad"
case "Neutral":
fallthrough
case "Mixed":
topSentiment = "neutral"
}

return topSentiment
}

func handler(ctx context.Context, event InputEvent) (OutputEvent, error) {
text := event.Detail.Text
// creating a new session for comprehend and forcing eu west 1 as regions since its not available in the eu north 1 region
svc := comprehend.New(session.Must(session.NewSession(&aws.Config{
Region: aws.String("eu-west-1"),
})))

// Preparing the parameters to be sent to comprehend
sentimentParams := &comprehend.DetectSentimentInput{
Text: aws.String(text),
LanguageCode: aws.String("en"),
}

keyPhraseParams := &comprehend.DetectKeyPhrasesInput{
Text: aws.String(text),
LanguageCode: aws.String("en"),
}

// Detecting the sentiment of the provided text
sentimentResult, err := svc.DetectSentiment(sentimentParams)
if err != nil {
return OutputEvent{}, fmt.Errorf("failed to detect sentiment: %w", err)
}

keyPhraseResult, err := svc.DetectKeyPhrases(keyPhraseParams)
if err != nil {
return OutputEvent{}, fmt.Errorf("failed to process text: %w", err)
}

var keyPhrases []string
// Getting the top sentiment
topSentiment := getTopSentiment(sentimentResult.SentimentScore)
// start the phrases array with the emotion
keyPhrases = append(keyPhrases, topSentiment)
// append all phrases to string array
for _, phrase := range keyPhraseResult.KeyPhrases {
keyPhrases = append(keyPhrases, *phrase.Text)
}
// create a string from the key words
summary := strings.Join(keyPhrases, " ")

// Creating the prompt to be used for image generation
prompt := fmt.Sprintf("Generate a image based on the following key words: %s", summary)

// Creating the output and returning to the next function
outputEvent := OutputEvent{Prompt: prompt, S3Key: event.Detail.S3Key, Bedrock: event.Detail.Bedrock}

return outputEvent, nil
}

func main() {
lambda.Start(handler)
}
Generate image DALL-E function
In this Lambda function, I process prompts sent by the previous function and generate images using the DALL-E API. When triggered with a prompt and an S3 key, the following should happen:
  • Retrieve the OpenAI API key from the environment variables.
  • Create a request body for the DALL-E API using the provided prompt.
  • Send an request to the DALL-E API to generate an image.
  • Read and parse the response from the DALL-E API.
  • Extract the image URL from the response.
  • Send the image URL and S3 key to the next function.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
package main

import (
"bytes"
"context"
"encoding/json"
"fmt"
"io"
"log"
"net/http"
"os"

"github.com/aws/aws-lambda-go/lambda"
)

type InputEvent struct {
Prompt string `json:"prompt"`
S3Key string `json:"s3Key"`
}

type OutputEvent struct {
Url string `json:"url"`
S3Key string `json:"s3Key"`
}

// Type for how the request to DALL-E should be structured
type DalleRequest struct {
Model string `json:"model"`
Prompt string `json:"prompt"`
Size string `json:"size"`
N int `json:"n"`
}

// Type for how the request response is structured
type ResponseObject struct {
Data []struct {
Url string `json:"url"`
} `json:"data"`
}

func handler(ctx context.Context, event InputEvent) (OutputEvent, error) {
// Grabbing our api-key from the environment variable
openaiApiKey := os.Getenv("OPENAI_API_KEY")
if openaiApiKey == "" {
log.Println("Error: OPENAI_API_KEY environment variable is not set")
return OutputEvent{}, fmt.Errorf("missing OPENAI_API_KEY")
}

// Preparing the DALL-E request
requestBody := DalleRequest{
Model: "dall-e-3",
Prompt: event.Prompt,
Size: "1024x1024",
N: 1,
}

// preparing the payload for the request
payload, err := json.Marshal(requestBody)
if err != nil {
log.Println("Error marshalling data:", err)
return OutputEvent{}, err
}

// creating request
client := &http.Client{}
req, err := http.NewRequest("POST", "https://api.openai.com/v1/images/generations", bytes.NewBuffer(payload))
if err != nil {
log.Println("Error creating request:", err)
return OutputEvent{}, err
}

// Setting headers for the request
req.Header.Set("Content-Type", "application/json")
req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", openaiApiKey))

// sending request to openAI
resp, err := client.Do(req)
if err != nil {
log.Println("Error sending request:", err)
return OutputEvent{}, err
}
defer resp.Body.Close()

// Reading all data from the response
body, err := io.ReadAll(resp.Body)
if err != nil {
log.Println("Error reading response body:", err)
return OutputEvent{}, err
}
// If the response is not of status OK an error is returned
if resp.StatusCode != http.StatusOK {
log.Printf("Non-OK HTTP status: %s\nResponse body: %s\n", resp.Status, string(body))
return OutputEvent{}, fmt.Errorf("non-OK HTTP status: %s", resp.Status)
}

// Un-marshalling the response
var respData ResponseObject
err = json.Unmarshal(body, &respData)
if err != nil {
log.Println("Error unmarshalling response:", err)
return OutputEvent{}, err
}

if len(respData.Data) == 0 {
log.Println("No data received in the response")
return OutputEvent{}, fmt.Errorf("no data in the response")
}

// Extracting the URL for the image
imageURL := respData.Data[0].Url

// Creating the output and returning to the next function
outputEvent := OutputEvent{Url: imageURL, S3Key: event.S3Key}

return outputEvent, nil
}

func main() {
lambda.Start(handler)
}
Generate image Bedrock function
In this Lambda function, I process text prompts to generate images using Amazon's Titan model and store these images in an S3 bucket. When triggered with a text prompt and an S3 key, the following should happen:
  • Reading the S3 bucket name from an environment variable and loading the AWS configuration.
  • Preparing the payload that wraps the text prompt into parameters required for the image generation task and specifying the image dimensions etc.
  • Invoking the Titan image generation model with the prepared payload.
  • The Titan model processes the prompt and returns a base64 encoded image
  • Decode the base64 encoded image into a byte array.
  • Upload to s3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
package main

import (
"bytes"
"context"
"encoding/base64"
"encoding/json"
"fmt"
"log"
"net/http"
"os"

"github.com/aws/aws-lambda-go/lambda"
"github.com/aws/aws-sdk-go-v2/config"
"github.com/aws/aws-sdk-go-v2/service/bedrockruntime"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/s3"
)

type InputEvent struct {
Prompt string `json:"prompt"`
S3Key string `json:"s3Key"`
}

type OutputEvent struct {
Url string `json:"url"`
}

// Titan specific types for requests towards bedrock
type TitanInputTextToImageInput struct {
TaskType string `json:"taskType"`
ImageGenerationConfig TitanInputTextToImageConfig `json:"imageGenerationConfig"`
TextToImageParams TitanInputTextToImageParams `json:"textToImageParams"`
}

type TitanInputTextToImageParams struct {
Text string `json:"text"`
NegativeText string `json:"negativeText,omitempty"`
}

type TitanInputTextToImageConfig struct {
NumberOfImages int `json:"numberOfImages,omitempty"`
Height int `json:"height,omitempty"`
Width int `json:"width,omitempty"`
Scale float64 `json:"cfgScale,omitempty"`
Seed int `json:"seed,omitempty"`
}

type TitanInputTextToImageOutput struct {
Images []string `json:"images"`
Error string `json:"error"`
}

// Function to decode base64 image
func decodeImage(base64Image string) ([]byte, error) {
decoded, err := base64.StdEncoding.DecodeString(base64Image)
if err != nil {
return nil, err
}
return decoded, nil
}

func handler(ctx context.Context, event InputEvent) error {
// Grabbing the name of the bucket from the environment variable
bucketName := os.Getenv("BUCKET_NAME")
if bucketName == "" {
log.Println("Error: BUCKET_NAME environment variable is not set")
return fmt.Errorf("missing BUCKET_NAME")
}

// Preparing config for the runtime and forcing us east 1 since its not available in eu north 1
cfg, err := config.LoadDefaultConfig(context.Background(), config.WithRegion("us-east-1"))
if err != nil {
return err
}

// Creating the runtime for bedrock
runtime := bedrockruntime.NewFromConfig(cfg)

// Preparing the request payload for bedrock
payload := TitanInputTextToImageInput{
TaskType: "TEXT_IMAGE",
TextToImageParams: TitanInputTextToImageParams{
Text: event.Prompt,
},
ImageGenerationConfig: TitanInputTextToImageConfig{
NumberOfImages: 1,
Scale: 8.0,
Height: 1024.0,
Width: 1024.0,

},
}

payloadString, err := json.Marshal(payload)
if err != nil {
return fmt.Errorf("unable to marshal body: %v", err)
}

accept := "*/*"
contentType := "application/json"
model := "amazon.titan-image-generator-v1"

// Sending request to bedrock
resp, err := runtime.InvokeModel(context.TODO(), &bedrockruntime.
InvokeModelInput{
Accept: &accept,
ModelId: &model,
ContentType: &contentType,
Body: payloadString,
})

if err != nil {
return fmt.Errorf("error from Bedrock, %v", err)
}

var output TitanInputTextToImageOutput

err = json.Unmarshal(resp.Body, &output)
if err != nil {
return fmt.Errorf("unable to unmarshal response from Bedrock: %v", err)
}
// Decoding base64 to be able to upload image to S3
decoded, err := decodeImage(output.Images[0])
if err != nil {
return fmt.Errorf("unable to decode image: %v", err)
}

// Creating a session for S3
sesh := session.Must(session.NewSession())

s3Client := s3.New(sesh)

objectKey := event.S3Key

// Uploading image to S3
_, err = s3Client.PutObject(&s3.PutObjectInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
Body: bytes.NewReader(decoded),
ContentType: aws.String(http.DetectContentType(decoded)),
})
if err != nil {
log.Println("Error uploading image to S3:", err)
return err
}

log.Println("Successfully uploaded image to S3:", objectKey)

return nil
}

func main() {
lambda.Start(handler)
}
Upload image function
In this Lambda function, I process URLs sent by the previous function to download images and upload them to an S3 bucket. When triggered with an image URL and an S3 key, the following should happen:
  • Retrieve the bucket name and region from environment variables.
  • Download the image from the provided URL.
  • Read the image data from the HTTP response.
  • Initialize an S3 client.
  • Upload the image data to the specified S3 bucket using the provided S3 key.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
package main

import (
"bytes"
"context"
"fmt"
"io"
"log"
"net/http"
"os"

"github.com/aws/aws-lambda-go/lambda"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/s3"
)

type Event struct {
Url string `json:"url"`
S3Key string `json:"s3Key"`
}

func handler(ctx context.Context, event Event) error {
// Grabbing the name of the bucket from the environment variable
bucketName := os.Getenv("BUCKET_NAME")
if bucketName == "" {
log.Println("Error: BUCKET_NAME environment variable is not set")
return fmt.Errorf("missing BUCKET_NAME")
}

region := os.Getenv("REGION")
if region == "" {
log.Println("Error: REGION environment variable is not set")
return fmt.Errorf("missing REGION")
}

imageResp, err := http.Get(event.Url)
if err != nil {
log.Println("Error downloading image:", err)
return err
}
defer imageResp.Body.Close()

// Downloading image from provided URL
if imageResp.StatusCode != http.StatusOK {
log.Printf("Error: received non-OK HTTP status when downloading image: %s\n", imageResp.Status)
return fmt.Errorf("non-OK HTTP status when downloading image: %s", imageResp.Status)
}

imageData, err := io.ReadAll(imageResp.Body)
if err != nil {
log.Println("Error reading image data:", err)
return err
}
// Creating a session for S3
sesh := session.Must(session.NewSession())

s3Client := s3.New(sesh)

objectKey := event.S3Key
// Uploading image to S3
_, err = s3Client.PutObject(&s3.PutObjectInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
Body: bytes.NewReader(imageData),
ContentType: aws.String(http.DetectContentType(imageData)),
})
if err != nil {
log.Println("Error uploading image to S3:", err)
return err
}

log.Println("Successfully uploaded image to S3:", objectKey)

return nil
}

func main() {
lambda.Start(handler)
}
Get image function
In this Lambda function, I handle API Gateway requests to check the existence of an object in an S3 bucket and return its URL. When triggered with a request, the following should happen:
  • Retrieve the bucket name from the environment variables.
  • Extract the s3Key query parameter from the request.
  • Initialize an S3 client.
  • Check if the object exists in the S3 bucket using the s3Key.
  • If the object exists, return a response with the object's URL.
  • If the object does not exist or there is an error, return a error message.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
package main

import (
"context"
"fmt"
"log"
"os"

"github.com/aws/aws-lambda-go/events"
"github.com/aws/aws-lambda-go/lambda"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/s3"
)

func handler(ctx context.Context, request events.APIGatewayProxyRequest) (events.APIGatewayProxyResponse, error) {
//Grabbing the name of the bucket from the environment variable
bucketName := os.Getenv("BUCKET_NAME")
if bucketName == "" {
log.Println("Error: BUCKET_NAME environment variable is not set")
return events.APIGatewayProxyResponse{
StatusCode: 400,
Body: "missing BUCKET_NAME",
},nil
}

// Extracting the S3 key
queryParams := request.QueryStringParameters
s3Key, exists := queryParams["s3Key"]
if !exists {
return events.APIGatewayProxyResponse{
StatusCode: 400,
Body: "Missing query parameter 's3Key'",
},nil
}

// Creating a connection to S3
sesh := session.Must(session.NewSession())

s3Client := s3.New(sesh)

input := &s3.HeadObjectInput{
Bucket: aws.String(bucketName),
Key: aws.String(s3Key),
}

// Checking if image exist
_, err := s3Client.HeadObject(input)
if err != nil {
log.Printf(`{"url": "https://%s.s3.eu-north-1.amazonaws.com/%s"}`, bucketName, s3Key)
log.Println("param:", s3Key)
log.Println("Error checking if object exists:", err)
// If it does not exist return 404
return events.APIGatewayProxyResponse{
StatusCode: 404,
Body: "Object does not exist",
}, nil
}

// If it does exist return url
return events.APIGatewayProxyResponse{
StatusCode: 200,
Body: fmt.Sprintf(`{"url": "https://%s.s3.eu-north-1.amazonaws.com/%s"}`, bucketName, s3Key),
}, nil
}

func main() {
lambda.Start(handler)
}

How I built it: Frontend

For the frontend I wanted to do it simple and used a pre-built component library and some "out of the box" great tools for the requests.
To create the frontend I initialized a new React project with typescript using npx create-react-app my-app --template typescript in a new folder.
I created a folder that I named components in which I added another folder named mainContent container a file named index.tsx. Since the focus was not on the frontend for this project I decided to have all my content in the same component. This component looked like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
import React from "react";
import { Button, Flex, Layout, Spin, Image, Switch } from "antd";
import TextArea from "antd/es/input/TextArea";
import { Typography } from "antd";
import { useMutation, useQuery } from "@tanstack/react-query";
import { getImage, sendPost } from "../../utils/api";

const { Paragraph, Text } = Typography;

const { Header, Content } = Layout;

const headerStyle: React.CSSProperties = {
textAlign: "center",
color: "#fff",
height: 64,
paddingInline: 48,
lineHeight: "64px",
backgroundColor: "#00415a",
};

const contentStyle: React.CSSProperties = {
textAlign: "center",
minHeight: 120,
lineHeight: "120px",
color: "#fff",
backgroundColor: "#00719c",
};

const layoutStyle = {
borderRadius: 8,
overflow: "hidden",
width: "calc(50% - 8px)",
maxWidth: "calc(50% - 8px)",
marginTop: "10vh",
height: "100%",
};

const textAreaStyle = {
width: "80%",
height: "80%",
};

const buttonStyle = {
width: "80%",
height: "80%",
marginBottom: 16,
backgroundColor: "#009bd6",
};
const paragraphStyle = {
margin: 10,
};

const textStyle = {
color: "white",
};

export default function MainContent() {
const id = "testing";
const [post, setPost] = React.useState("");
const [image, setImage] = React.useState("");
const [refetch, setRefetch] = React.useState(true);
const [loading, setLoading] = React.useState(false);
const [bedrock, setBedrock] = React.useState(false);

// A function to trigger the generation of an image
const { mutate } = useMutation({
mutationFn: () => {
setLoading(true);
return sendPost("/generate-image", {
text: post,
s3Key: `image/${id}`,
bedrock: bedrock,
});
},
});
// A function for polling the /get-image endpoint. When status 200 is returned polling stops and image is displayed
const { data } = useQuery({
queryKey: ["image"],
refetchInterval: 3000,
refetchIntervalInBackground: true,
enabled: refetch,
queryFn: async () => {
const imageData = await getImage("/get-image", { s3Key: `image/${id}` });
if (imageData.status === 200) {
setRefetch(false);
setImage(imageData.data);
setLoading(false);
}
return imageData.data;
},
});

const handlePostChange = (event: React.ChangeEvent<HTMLTextAreaElement>) => {
setPost(event.target.value);
};

const handleBedrockChange = (checked: boolean) => {
setBedrock(checked);
};

return (
<Spin spinning={loading}>
<Flex justify="center">
<Layout style={layoutStyle}>
<Header style={headerStyle}>Epic Post To Image POC</Header>
<Content style={contentStyle}>
<Paragraph style={paragraphStyle}>
<Text strong style={textStyle}>
Write a post as you would for a social media platform. Click
generate to create an image for your post.
</Text>
.
</Paragraph>

<TextArea
style={textAreaStyle}
rows={4}
placeholder="Write here"
onChange={handlePostChange}
/>

<Paragraph style={paragraphStyle}>
<Text strong style={textStyle}>
Use Bedrock? (Toggle to use bedrock)
</Text>

<Switch onChange={handleBedrockChange} />
</Paragraph>
{
// If the image exist it will be shown
image && (
<Image
width={"80%"}
style={{ marginTop: "10px" }}
src={image}
/>

)
}
<Button type="primary" style={buttonStyle} onClick={() => mutate()}>
Generate
</Button>
</Content>
</Layout>
</Flex>
</Spin>

);
}
In this component I handled almost everything. The only thing I outsourced was the functions for the requests. Those were added to a file which I named api.ts inside a folder called utils. The functions for the requests looked like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import axios from "axios";

const API_URL = process.env.REACT_APP_API_URL;

export async function sendPost(
url: string,
data: { text: string; s3Key: string; bedrock: boolean }
): Promise<{ status: number; data: string }> {
try {
const response = await axios({
method: "post",
url: `${API_URL}${url}`,
data,
});

return {
status: response.status,
data: response.data,
};
} catch (error: any) {
return {
status: error.response?.status || error.status,
data: error.response,
};
}
}

export async function getImage(
url: string,
params: { s3Key: string }
): Promise<{ status: number; data: string }> {
try {
const response = await axios({
method: "get",
url: `${API_URL}${url}`,
params,
});

return {
status: response.status,
data: response.data.url,
};
} catch (error: any) {
return {
status: error.response?.status || error.status,
data: error.response,
};
}
}
In this React Component the following happens:
  • Display a layout with a header and content section.
  • Allow the user to write a post in a text area.
  • Providing a switch for the user to choose whether to use the Bedrock or DALL-E by default.
  • When the "Generate" button is clicked, send the post data to the backend service to generate an image.
  • Constantly poll for the generated image from the backend and display it once available.
  • When the user requests for the image a loader is shown until the image is retrieved.
To see the full frontend implementation. Have a look at my repository
The frontend turned out to look like this:
frontend
frontend

Result and summary

Did it work? Yes it did!
With DALL-E
dalle
dalle
With Bedrock
bedrock
bedrock
From the tests I have been doing in this POC I have to give the win to DALL-E. It just better understands the prompts and creates better images currently. While my familiarity with DALL-E might have contributed to these results, as I am less experienced with Bedrock, the difference in performance was still notable. Testing them both was fun, and I found the comparison between the two worth it.
Some improvements can of course be done. For example I could have skipped the whole polling part and instead created a websocket API. For this POC i decided not to do that. But who is to say that wont happen in a future blog post...
Other future improvements would be to revisit the prepare prompt function and make it a bit more advanced. One thing could be to add the possibility for the user to choose what style of image they want generated, should it be realistic or illustrated?
All in all, I found this project to be fun and I got to play around with Gen AI a bit which was super cool :)
A final note is that these AI tools are expensive so be aware of costs!

Deep Dive

If you want to have a closer look at what has been used during this project. Here are some links:

Comments