#TGIFun๐ŸŽˆ YOLambda: Running Serverless YOLOv8/9

Run inference at scale with YOLOv8/9 in a secure and reliable way with AWS Lambda.

Joรฃo Galego
Amazon Employee
Published Apr 5, 2024
Last Modified May 6, 2024

Overview

In this episode of #TGIFun๐ŸŽˆ, I'd like to demonstrate a quick and easy way to deploy YOLOv8/9 ๐Ÿ‘๏ธ on AWS Lambda using the AWS SAM (Serverless Application Model) CLI.
Hosting YOLO on Lambda strikes a good balance between performance, scalability and cost efficiency. Plus, it's always fun to put stuff inside Lambda functions. If you're interested in exploring other deployment options though, feel free to scroll all the way down to the References section.
๐Ÿ‘จโ€๐Ÿ’ป All code and documentation is available on GitHub.

YOLO in Pictures ๐Ÿ–ผ๏ธ

So what's YOLOv8 and why should you care? Let's start with a short recap...
YOLOv8 (You Only Look Once) is a state-of-the-art computer vision model that supports multiple tasks
Source: https://docs.ultralytics.com/tasks/
It builds on top of an already long history of YOLO models
Source: Terven & Cordova-Esparza (2023)
and it was designed to be smaller ๐Ÿค and faster โšก than previous iterations.
Source: https://github.com/ultralytics/ultralytics
While a full description of the YOLOv8 architecture is well beyond the scope of this article, it's useful to gain some intuition on what's happening behind the scenes.
Referring back to the original (YOLOv1) paper, YOLO models work by dividing the input image into a grid, predicting a set of bounding boxes [note: as we will see shortly, these are expressed as 2+2-tuples of top-left (x1, y1) and bottom-right coordinates (x2, y2)], as well as their associated confidence scores and class probabilities, to generate the final predictions.
Source: Redmon et al. (2015)
It goes without saying that I'm obviously oversimplifying things here.
Over the years, there have been many improvements like faster NMS implementations (Non-Maximum Suppression, in case you're wondering) or the use of "bag-of-freebies" and "bag-of-specials" approaches (best names ever!) that have made YOLO faster and stronger.
Fortunately, you won't need to care about those at all to work on this project. If you're interested in such things though, I strongly encourage you to read up on the history of YOLO.
โ— This introduction was written before the release of YOLOv9.
๐Ÿ’ก If you want to learn more, just scroll all the way down to the References section.

Goal ๐ŸŽฏ

In this project, we're going to create a simple object detection app that accepts an image ๐Ÿ–ผ๏ธ
Our whole universe was in a hot, dense state...
sends it to the YOLO model and returns a list of detected objects
which we can then place on top of the original image
... It all started with the big bang!
We're going to use a vanilla YOLOv8 model, but you're more than welcome to use a fine-tuned model or to train your own YOLO.
Source: Roboflow
Sounds fun? ๐Ÿคฉ Then buckle up and let's built it together!

Instructions

Prerequisites โœ…

Before we get started, make sure these tools are installed and properly configured:

Steps ๐Ÿ“œ

Let's start by cloning the repository
๐Ÿงช Switch to the feat/YOLOv9 branch if you're feeling experimental!
As a best practice, I recommend you create a Conda environment or something similar to keep everything isolated
Once the environment is activated, we can kick things off and install the project dependencies
One of those dependencies is the ultralytics package which includes the yolo CLI.
We can use it to download the YOLOv8 model and convert it to the ONNX format:
๐Ÿ’ก The YOLOv8 series offers a wide range of models both in terms of size (nano >> xl) and specialized task like segmentation or pose estimation. If you want to try a different model, please refer to the official documentation (Supported Tasks and Modes).
๐Ÿงช Replace yolov8n with yolov9c in the commands below to work with YOLOv9. Just keep in mind that the performance and the output of our app may not be the same.
As an optional step, we can run ONNX Simplifier (based on ONNX Optimizer) against our model to get rid of redundant operations. Everything counts to reduce the size of our model and make it run faster.
Once this is done, we can look "under the hood" and take a peek at the computational graph with a tool like Netron:
Detail of the YOLOv8 architecture
Let's move our model to a dedicated folder
and use the AWS SAM CLI to build a container image for our appp
While in development, we can test our app by using sam local
Whenever you're ready, just start the deployment ๐Ÿš€
While this is running, let's take a closer look at our template
Here are a few important things to notice:
The deployment should be done by now ๐Ÿ˜Š Don't forget to note down the function URL
You can use tools like awscurl to test the app (awscurl will handle the SigV4 signing for you)
or create your own test scripts
And that's it! ๐Ÿฅณ We just crossed the finish line...
Sooo, what's next? Here are a few recommendations:
  • Explore the code - it's just there for the taking, plus I left some Easter eggs and L400 references in there for the brave ones.
    • Check out the feat/YOLOv9 branch to test the newest member of the YOLO family
  • Build your own app - I'm pretty sure you already have a cool use case in mind
  • Share with the community - leave a comment below if you do something awesome
I hope you enjoyed it, see you next time! ๐Ÿ‘‹
This is the first article in the #TGIFun๐ŸŽˆ series, a personal space where I'll be sharing some small, hobby-oriented projects with a wide variety of applications. As the name suggests, new articles come out on Friday. // PS: If you like this format, don't forget to give it a thumbs up ๐Ÿ‘ Work hard, have fun, make history!

References ๐Ÿ“š

Articles

Blogs

Miscellaneous

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments