AWS Logo
Menu
Serverless AI powered content moderation service

Serverless AI powered content moderation service

In this post, I extend the File Manager service I built previously by adding content moderation capabilities. The original service stores files in S3 and records them in a DynamoDB table, using a serverless, event-driven approach. Now, with AWS GuardDuty and Rekognition, I’ve enhanced the service with malware scanning and image moderation.

Published Oct 31, 2024
Around one year ago I created a blog post about the creation of a File Manager service. in this post we will use this service as our base and extend it with content moderation. We'll use GuardDuty and Rekognition to assist in this task. As usual everything will be serverless and event-driven.

Recap

To refresh everyone's memory let's start with a short recap.
The service will store files in S3 and keep a record of all the files in a DynamoDB table. The system overview looks like this, with and API exposing functionality to the user and then carry out the work in a serverless and event-driven way.
The upload flow is initiated by a client calling the API where a Lambda function will creat pre-signed S3 url that the client can use to upload the file. We don't upload file directly over the API, since Amazon API gateway has a max payload size of 10mb, and to support all kinds of files this will become a limitation.
When the client then uploads the file to S3 this will generate an event, the part below the dashed line in the image, this will invoke a StepFunction that will update the file inventory.

Extended architecture

In the extended architecture we add functionality to use GuardDuty S3 malware scanning and Rekognition for image moderation. GuardDuty will scan new files that arrive in the S3 bucket, that I call staging, a tag will will be added to the object and the scan result posted to the default event-bus. The scan result, if OK, will invoke a StepFunction next that utilize Rekognition for image moderation. I have implemented the same logic in this StepFunction and add a tag on the object and post an event onto a event-bus. Finally files are moved to either a quarantine ocr storage bucket.
Every part of the solution is decoupled and can run independently and a saga pattern is applied to move the logic to the next phase.
Now let's dig a bit deeper into each of the parts of this solution.

Malware scanning

The GuardDuty Malware scanning doesn't require much setup. This is a fully managed feature in GuardDuty and the only thing that is required is that a configuration of it. GuardDuty will then pick up new object automatically.
To achieve this flow the only thing we need to do is to create a S3MalwareProtectionPlan and assign it appropriate permissions. One important thing to remember, if you encrypt your objects in S3 with a Customer Managed Key, don't forget to give GuardDuty permissions to decrypt using this key.

Image moderation

The moderation part with Rekognition involves a couple of more steps. A StepFunction is invoked by the result from the Malware scan, and call Rekognition to moderate the image. This StepFunction will then tag the object and post the scan result onto EventBridge custom service bus. One important thing to remember, if you encrypt your objects in S3 with a Customer Managed Key, don't forget to give permissions to decrypt using this key. Rekognition will give you a strange error Unsupported and not a clear error to why it failed in this case.
To achieve this flow the only thing we need to do is to create the StateMachine and setup the events it should be invoked on.
The StateMachine definition is rather large, and there is need for some magic. Since you can't append tags to an S3 object, we first need to fetch all existing tags, append our new tag and put the entire array of tags on the object. This would probably be easier to do in a Lambda function, but where is the fun in that. Intrinsic functions for the win....

Finalize the upload

The last part of this solution is to react to the moderation and place the content either in the quarantine bucket or the long term bucket. For this I use two different StepFunction with some difference in event that invokes them.
To achieve this flow a new StateMachine is created.
With a StateMachine definition that is a bit easier to follow then the image moderation part.

Conclusion

This was a short post on how I extended my previous built file manager with malware scanning and image moderation. Using only managed services made this a fairly easy task.
To get the full source code and deploy it your self, visit Serverless-Handbook Image Moderation

Final Words

Don't forget to follow me on LinkedIn and X for more content, and read rest of my Blogs
As Werner says! Now Go Build!
 

Comments