logo
Menu
Websocket follow-up, image creation with Gen AI

Websocket follow-up, image creation with Gen AI

Follow-up to previous post

Published Jun 3, 2024

Intro

So I mentioned in the previous post that I could have done this even better making use of a websocket api instead of constantly polling the api gateway. This got me thinking and I could not help my self and started working on it straight away..
The changes, visualized:
diagram
Diagram

What is websockets?

Websockets is a technology that allows websites and applications to have real-time, two-way communication with servers. It can be explained like having a direct conversation with someone, where both sides can speak and listen at the same time, whenever they want.

How it works simply explained

Normally, when you visit a website, your browser sends a request to the server, and the server sends back a response. This process repeats every time you need new information.
Using websockets, after an initial handshake (like saying hello to a person) using the usual web protocol (HTTP), the connection switches to a WebSocket connection. This special connection stays open, allowing data to be sent back and forth instantly and continuously. It’s like keeping the phone line open when talking to someone, instead of having to dial each other up every time.
Websockets is for example useful for applications that need real-time updates, such as live chat systems, online games, and live notifications. By keeping the connection open and allowing instant communication, websockets help make these applications faster and more responsive.
To explain it shortly, websockets create a direct, open communication line between your browser and a server, enabling fast, efficient, and real-time interactions.

Why is websockets relevant for this project?

Since I set up this project using eventBridge and step functions to handle the image generation process, the process starts with API Gateway (with a request from my frontend). However, because I don't know how long the image generation will take, it's not practical to use a normal API that waits for a response.
Using a normal API means the client sends a request and waits for the server to respond. If the image generation process takes a long time, the client would have to wait the entire time, which isn't efficient and can cause the browser to time out or appear unresponsive to the user. This can be problematic for tasks like image generation, where the processing time can vary depending on the detail of the image etc. To handle this before, I did not return a response to the frontend and used a polling method instead. This meant that I had to send requests to the backend constantly to check if the image had been created. This lead to many requests to the backend.
That is why I wanted to use websockets instead. Since websockets keep the connection open I have the possibility to reply to the frontend as soon as the image is ready, no matter how long it takes. With this I ensure that the frontend gets the result in real-time without constantly checking the server for updates. This makes the whole process more efficient and improves the user experience.

What different in the project from tha last post?

Frontend

As mentioned in the last post I used a polling mechanism with React query and also triggered the image generation process with a request. They both looked like this:
The useQuery function requests the backend every 3 seconds until the response is of status OK and I get the image back. This works very well but it leads to alot of request and can be made more efficient.
Since I later will explain how I changed my api to a websocket api instead I had to make some changes to the frontend as well. What I did was:
  • Remove both the mutation and the useQuery function
  • Uninstall React-Query
  • Create a context in react for websocket requests and use the build in lib for websockets.
  • Use context for requests.
I wrapped my whole react app in this context so it would be available to use inside my component.
The context looks like this:
What important is happening here?
  • I initialize a few state variables:
    • socket: This holds the websocket instance.
    • isLoading: Holds the information if I have sent a message and are waiting for a reply. (So I can tell the user that it is loading)
    • url: Stores the url from the websocket reply.
  • UseEffect:
    • Runs once when the component is called.
    • Establishes a WebSocket connection.
  • sendMessage function:
    • Sends the prompt via websocket to api gateway.
    • Sets loading to true which means I can give the user feedback that we are waiting for a reply.
  • onmessage:
    • If a message is received I parse the response so I can use the url to show the image in the component.
    • Finally sets loading to false to indicate that the image is generated.
In my component the code now looks like this:
When sending a message to the websocket api I need to tell the websocket which route to use, meaning I need to tell the websocket I want to call for the generate route.

Backend

To convert our api gateway api to a websocket api a few changes had to be made.

In the sam configuration the following changes has been done

  • The openApi definition of api gateway has been removed.
  • Websocket api definitions has been added:
To open a connection to this api, a route had to be defined for the sole purpose of opening a connection. This route is also integrated with a lambda:
When having a connect route I also needed a disconnect route and a disconnect lambda that looks very similar:
And finally I also needed a route and integration to generate the image:
Before when I used the regular api and not the websocket api I had the possibility to integrate api gateway directly to eventBridge. Now that was no longer possible so I decided to use a lambda function as a proxy between api gateway and eventBridge:
The connect and disconnect functions:

Lambdas

The three new lambdas are connect, disconnect and the eventBridge-proxy.
Connect
In this function I am only returning status 200 to tell the client that the connection has been established.
Disconnect
In this function I am only returning status 200 to tell the client that the connection has been disconnected.
EventBridge proxy
This function replaces the previous integration to eventbridge. What the lambda is doing is just forwarding the request from the api to eventBridge:
The other functions look pretty much the same as the last post with the difference that I am replying to the websocket in the end of the step function as following:

Summary

To summarize, using websockets makes the application run much more smoothly and reduces the number of requests being made. This results in a more pleasant user experience and lower bandwidth usage. However, there are some downsides to this approach. The IAC definitions become larger, and direct integration with EventBridge is no longer possible. Despite these issues, I prefer using websockets over polling because they just improve the experience and make everything run more efficiently.
To have a deeper look at the changes made all code can be found here.
 

Comments