logo
Menu
Scaling Urban Revitalization with AWS Lambda and Kinesis

Scaling Urban Revitalization with AWS Lambda and Kinesis

Learn how serverless can transform an open-source urban revitalization project by leveraging AWS Lambda and Amazon Kinesis to handle real-time data processing.

Published Aug 27, 2024
As a developer working on an open-source urban revitalization initiative for Jakarta, Indonesia a number of years back, I wanted a comprehensive solution to have a real-time analytics system to monitor and analyze data from a handful of sensors distributed around the metropolitan area. These sensors were responsible for collecting a wide range of data, which included temperature, PM 1.0, PM 2.5, PM 10, humidity, altitude, pressure, and CO2. The ultimate goal was to residents and urban developers with actionable insights that could improve urban living conditions and respond to environmental changes more effectively while advocating for better conditions for the environment.
Initially, the project started small, just a couple of sensors put across two different test sites. A traditional server-based architecture was sufficient to handle this data volume, which involved a standard setup using virtual machines (VMs) on a cloud provider. The servers were configured to collect data from the sensors, process it, and store the results in a database for analysis and visualization.
However, as the project expanded to include more sensors across a larger geographical area, several significant challenges emerged:
  1. Rapidly Increasing Data Volume: The number of sensors grew exponentially, each sending data every few seconds. This led to a massive increase in the amount of data that needed to be ingested, processed, and stored in real time. The existing server-based infrastructure started to show signs of strain, with continuous manual efforts needed to counteract the breakdowns or bottlenecks that kept occurring during peak data influx periods.
  2. Increased Latency and Delayed Insights: The initial system architecture could not handle the increased data load efficiently, leading to high latency in data processing. As a result, the insights that were supposed to be real-time were delayed by several minutes, making them less useful for time-sensitive decisions, such as in deciding to move out of the area in times when the air quality is poor.
  3. Skyrocketing Operational Costs: To cope with the growing data volume, we attempted to scale the server infrastructure by adding more VMs and increasing their capacities. However, this approach resulted in skyrocketing operational costs. We were constantly over-provisioning resources to handle peak loads, which left many servers underutilized during off-peak hours. The cost of maintaining and managing this infrastructure became quite unsustainable for the project.

The Solution: Adopting a Serverless Architecture with AWS Lambda and Amazon Kinesis

Recognizing the need for a more scalable, cost-effective, and low-maintenance solution, I decided to leverage more of AWS' serverless offerings, specifically AWS Lambda and Amazon Kinesis, to redesign the data processing pipeline.
I then took the following steps to transform the system:
  1. Implementing Amazon Kinesis Data Streams for Scalable Data Ingestion: To handle the continuously growing influx of data from the many IoT sensors, we decided to use Amazon Kinesis Data Streams. Kinesis Data Streams provides a scalable, durable, and highly available solution for ingesting and processing real-time streaming data. It allowed us to capture data from all the sensors in real time without worrying about the underlying infrastructure.
    • High Throughput: Kinesis Data Streams is designed to handle large amounts of data with high throughput. It can easily scale to accommodate the increasing data volume as more sensors are deployed, ensuring that all incoming data is captured and stored reliably.
    • Data Retention: Kinesis Data Streams provides configurable retention periods, allowing us to store data for up to seven days. This feature enabled us to maintain a buffer of recent data that could be reprocessed if necessary without affecting real-time processing.
    • Real-Time Data Ingestion: With Kinesis Data Streams, we were able to ingest data in real time, ensuring that no data was lost or delayed during transmission. This was a significant improvement over our previous server-based architecture, which struggled to keep up with the data flow.
  2. Leveraging AWS Lambda for Serverless Data Processing: Once the data was ingested into Kinesis, the next challenge was processing it in real time. To achieve this, we utilized AWS Lambda functions. AWS Lambda is a serverless compute service that automatically scales based on the number of incoming requests, which made it ideal for handling the unpredictable nature of our data stream.
    • Event-Driven Architecture: AWS Lambda is inherently event-driven, meaning it can be triggered by events such as data being added to a Kinesis stream. We set up Lambda functions to be invoked automatically whenever new data arrived in the stream. Each Lambda function was responsible for processing a batch of records, performing tasks such as data filtering, transformation, and aggregation.
    • Automatic Scaling: One of the biggest advantages of AWS Lambda is its ability to scale automatically. As the data volume increased, Lambda automatically scaled out to handle the additional load, without any manual intervention. This eliminated the need for over-provisioning and ensured that we only paid for the compute resources we actually used.
    • Reduced Latency: With Lambda functions processing data in real time as it arrived in Kinesis, we were able to significantly reduce latency. The system now processed data almost instantaneously, providing real-time insights that were crucial for time-sensitive decisions. For example, air quality alerts can now be communicated to those in the relevant area within a matter of seconds, allowing decisions to move out of the area to be taken quickly.
    • Modular and Decoupled Processing: By breaking down the data processing logic into smaller Lambda functions, we created a modular and decoupled architecture. Each function was responsible for a specific task, such as cleaning sensor data, performing calculations, or storing results in a database. This approach made the codebase more manageable and allowed us to quickly iterate on individual components without affecting the entire system.
  3. Optimizing Data Storage and Analysis with Amazon S3 and Amazon Athena: In addition to processing data in real time, we needed to store the processed data for historical analysis and reporting. For this, we used Amazon S3 and Amazon Athena.
    • Amazon S3 for Data Storage: We used Amazon S3, a scalable object storage service, to store processed data in a highly durable and cost-effective manner. S3 allowed us to store large volumes of data with minimal costs, making it ideal for our needs. By organizing data into partitions based on time and location, we ensured efficient access and retrieval for downstream analytics.
    • Amazon Athena for Ad-Hoc Queries: For ad-hoc queries and historical analysis, we leveraged Amazon Athena, a serverless interactive query service that allows users to analyze data directly in S3 using standard SQL. This enabled consumers to run queries on historical data without having to manage a separate database or data warehouse.
  4. Monitoring and Logging with Amazon CloudWatch: To ensure the system operated smoothly and to quickly identify any issues, we set up comprehensive monitoring and logging using Amazon CloudWatch.
    • CloudWatch Metrics: We configured CloudWatch to collect metrics from Kinesis, Lambda, and other AWS services involved in the data processing pipeline. This allowed the monitoring of Key Performance Indicators (KPIs) such as data ingestion rates, Lambda invocation times, and error rates, providing insights into system health and performance and allowing for fast action to be taken should it be necessary.
    • CloudWatch Logs: CloudWatch Logs provided a centralized logging solution for all Lambda functions. This was used to capture detailed logs of every function execution, including input data, processing results, and any errors or exceptions. This helped us troubleshoot issues quickly and improve the overall reliability of the system.

Key Learnings from the Experience

By transitioning to a serverless architecture using AWS Lambda, Amazon Kinesis, and other AWS services, we successfully resolved the scalability, latency, and cost challenges we faced with the initial server-based architecture. Here are the key outcomes:
  • Scalability and Performance: The new serverless architecture proved highly scalable, capable of handling many more sensors and data points without any manual intervention. The system could effortlessly scale up or down based on data volume, ensuring optimal performance at all times.
  • Cost Efficiency: The pay-as-you-go pricing model of AWS Lambda and other serverless services significantly reduced operational costs. We no longer needed to over-provision servers to handle peak loads, and we only paid for the actual compute resources used. This resulted in substantial cost savings compared to the previous server-based setup.
  • Real-Time Insights: The combination of Kinesis and Lambda enabled us to process data in real time with minimal latency, delivering timely insights to consumers as necessary. This improved the effectiveness of the smart city initiative by enabling faster decision-making and more responsive services.
  • Reduced Maintenance Overhead: By offloading infrastructure management to AWS, we reduced the maintenance burden on the development and operations teams. We no longer had to worry about server management, updates, or scaling, which allowed more focus on building out new features.
  • Improved Modularity and Flexibility: The modular nature of the Lambda-based architecture allowed us to iterate quickly and make improvements to specific components without affecting the entire system. This flexibility allowed optimizations to be made easily.

Conclusion

Transitioning to a serverless architecture with AWS Lambda and Amazon Kinesis was a great step ahead for the urban revitalization project. It allowed us to build a scalable, cost-effective, and efficient data processing pipeline that could handle the demands of a continuously growing amount of devices used. By sharing this experience, I hope to inspire more developers on the best practices of working with serverless for your IoT projects.
 

Comments