AWS Logo
Menu
How I built a POC for a serverless newsletter that is not about serverless

How I built a POC for a serverless newsletter that is not about serverless

Building poc for a newsletter

Published Dec 4, 2024

Intro

My initial thought for this project, was to create a daily newsletter that would deliver an email filled exclusively with positive and uplifting news. This concept shaped the entire project and all efforts were directed towards achieving this goal. My plan was to use Comprehend to analyze the sentiment of news articles and compile a newsletter of happy news to send to subscribers each day.
However, I overlooked a critical aspect during the planning phase, the availability of positive news. Naively, I did not review the nature of daily news beforehand and was unaware that there is currently a lack of genuinely positive news stories. This issue was partly due to the limited scope of my POC, as I scraped articles from only one RSS feed. Additionally, the lack of positive news might be influenced by the fact that such stories typically generate fewer clicks.
When the project was completed and I began scraping articles, I found only one or fewer positive pieces per day. This outcome was disappointing and made it challenging to showcase a compelling POC for a serverless newsletter focused on happy news.
Fortunately, the project was structured in a way that allowed for a pivot. I shifted the focus to create a newsletter summarizing the latest news of the day instead. This adjustment better aligned with the available data and demonstrated the flexibility of the serverless architecture I had built.
The goal of the POC was to get an email every day with summarized articles from the current day.
Note that all code is not included in the blog post but the whole repository can be found here
How the newsletter turned out:
newsletter
newsletter

Services used

For this project I ended up using the following.
  • CDK, TypeScript
  • SecretsManager
  • EventBridge
  • Lambda, TypeScript
  • DynamoDB
  • Comprehend
  • SES
  • StepFunctions
  • openAI model - gpt-4o-mini

AI reasoning and why I chose openai

When it came to selecting a platform for my poc, the decision to go with openai over Bedrock was clear. The main reason for this choice was my prior experience with openai, which made it a natural option for me. Having already worked with openai's tools, I was familiar with how it worked. In contrast, the learning curve for Bedrock seemed too steep for a poc. Since time and ease of implementation were important for me, sticking with openai allowed me to focus on delivering results without the need to invest additional time in learning a new platform.
Given that I chose to use openai, I had to account for additional costs in my calculations. These included not only the fees for openai's API but also the costs associated with transferring data over the internet to their servers. To address this challenge, I decided to use Comprehend to extract only the key phrases from the articles before sending them to openai. By doing so, I was able to significantly reduce the amount of data being transferred while still preserving the essential context and meaning of the articles. This approach helped optimize costs without compromising the quality of the output. Although this was just a poc and no massive amounts of data was transferred either way, it was still nice to keep this in mind.
I decided to set up the infrastructure with CDK because I find it super easy and straightforward to work with. It also let me write my cloud setup in code using TypeScript (my preferred language), which makes things really clean and flexible. Plus, it’s great for keeping everything organized and easy to update later on.
Fot his project I needed a couple of different resources.

DynamoDB Table

Pretty straight forward. Just a table where I stored the articles, using id as partitionKey.

SecretsManager Secret

Also pretty straight forward. A empty secret in which I later, via the console, stored an apiKey and a url for a RSS feed. Since I wanted to use SecretsManager for the apiKey it made sense to also store the url here given that I would already pay for the secret with only the apiKey.

Lambda for scraping the RSS feed

I needed a Lambda that would have the purpose of scraping a RSS feed fow news articles. This Lambda needed access to the secret to be able to retrieve the url for the feed. It also needed access to the articleTable to be able to insert the articles gathered.

Lambda for analyzing and updating the articles

This Lambda needed the same access as the previous one but it also needed access to Comprehend. More specifically it needed access to detect key phrases from batches of text.

Lambda for sending emails

Same access here as well with the addition to some permissions for SES to be able to send emails.

StepFunction

The stepFunction was needed to tie the lambdas together so that they would execute in sequence. I also added a choiceState to only send emails if any articles were found. Permission to start the execution was given to eventbridge.

EventBridge rule

The last thing needed to get this to run every day was a eventbridge rule that should trigger the stepFunction once a day.

Code

With the infrastructure in place I added the code for the different lambdas.

Getting the articles

The first Lambda had the task to retrieve articles from a a RSS-feed, process them and insert them in a DynamoDB table. For this POC i decided to use the world RSS feed from BBC.
So what is this Lambda doing?
This Lambda function starts by retrieving the region and DynamoDB table name from environment variables. It then sets up a SecretsManager client to fetch a secret containing the url for an RSS feed. Using this url, the Lambda sends a request to retrieve the RSS feed and parses the XML response into a JSON structure. From the parsed data, it extracts articles and filters them to include only those published on the current day. The filtered articles are transformed into a standard format with attributes such as id, title, text, url and timestamp. These processed articles are then inserted into a DynamoDB table. Finally, the function returns a success response that includes the ids of the inserted articles along with a timestamp.

Processing the articles

The second Lambda function in the stepFunction receives the ids from the articles from the first Lambda function. From these ids the articles were retrieved from the dynamoDB table for processing.
So what is this Lambda doing?
This Lambda begins by retrieving the region and DynamoDB table name from environment variables. It sets up clients for DynamoDB, Secrets Manager and Comprehend, specifying a region for Comprehend since it is unavailable in the Lambda's default region. Additionally, it configures an openai client using an apiKey. The function retrieves a list of articles based on ids provided by the previous function and processes these articles by scraping their contents and summarizing them. To achieve this, it first uses Comprehend to extract key phrases from the articles, reducing the amount of data sent to openai for generating concise summaries. This approach helps manage external data transfer costs. Once the articles are summarized, they are inserted back into the DynamoDB table. The function finishes by returning a success response that includes the ids of the inserted articles along with a timestamp.

Sending the newsletter

The third and last Lambda function had the purpose of sending the newsletter to its recipients. Since this project is just a POC, the only one to receive the newsletter is me. This is solved by hardcoding and verifying my own email address in the code.
So what is this Lambda doing?
The Lambda is retrieving the region and DynamoDB table name from environment variables and configuring AWS clients for DynamoDB and SES to enable interaction with these services. It then retrieves a list of articles using the ids provided by the previous Lambda in the stepFunction and sends the newsletter containing these articles to the specified recipients.

Util functions

This project contains a variety of utility functions that I believe would be too extensive to cover in a single blog post. However, if you're interested in exploring them in more detail, all of these functions are available in the project's repository, which you can find here.

Summary

What made this project extra fun for me was the opportunity to get hands on experience with experimenting across various AI services. The process of integrating and combining AWS services with openai services turned out to be a fun challenge.
Looking ahead, if I were to revisit and remake this project, I would likely consider using Bedrock instead of openai. The reason for this is the potential advantage of keeping everything within the AWS ecosystem. By staying entirely within AWS, I could streamline operations, enhance compatibility and potentially reduce complexity. Additionally, using Bedrock would provide a nice opportunity to deepen my understanding of Bedrock, explore its capabilities and compare it to openai.
If I wanted to expand this project further to include more news sources, a straightforward way to do so would be by creating additional Lambda functions with the purpose to scrape and process data from these sources. Each Lambda function could be configured to handle a specific source, making maintenance easy. As a first step in managing this setup, I could implement a ChoiceState in stepFunctions. This would allow for dynamic decision-making based on the triggering event, enabling me to either run all source scraping lambdas simultaneously or selectively trigger specific ones based on the input parameters.
 

1 Comment