
AWS Cost Optimization Using Lambda Functions and Terraform
Lambda-based scheduling and monitoring systems that help reduce costs
Published Feb 26, 2025
Last Modified Feb 27, 2025
In modern cloud infrastructure, cost optimization and proactive incident prevention are crucial for maintaining efficient operations. This document outlines our implementation of AWS Lambda-based scheduling and monitoring systems that help reduce costs and prevent potential issues before they impact production.
💡 Note: For the complete code implementation and examples, please check my GitHub repository linked in the References section below.
I understand that there is an AWS Instance scheduler service but despite all its advantages, it has a number of disadvantages.
Although AWS Instance Scheduler is a useful tool for cost optimization, it has some limitations that may make it unsuitable for certain use cases:
- Limited Flexibility – The scheduler works based on predefined schedules and does not allow dynamic scaling or real-time adjustments based on workload demands.
- No Load-Based Scaling – It stops or starts instances according to a set schedule, without considering actual usage metrics such as CPU or memory utilization.
- Supports Only EC2 & RDS – The service is restricted to managing EC2 and RDS instances, excluding other AWS resources such as ECS, EKS, or Lambda.
- No Auto-Discovery – Users must manually configure schedules and tag instances, making it less automated for large-scale environments.
- IAM Permission Complexity – The required IAM roles and policies can be difficult to set up correctly, leading to potential security misconfigurations.
- Latency in Execution – Since the scheduler relies on AWS Lambda and DynamoDB, there may be slight delays in instance state changes.

We utilize several specialized Lambda functions to manage different AWS resources:
The ASG scheduler manages compute resources based on time schedules:
The RDS scheduler handles database maintenance tasks:
The EKS scheduler manages Kubernetes clusters:
Our system implements proactive monitoring of critical metrics:
- Database Metrics
- Storage space utilization
- CPU usage
- Connection count
- IOPS utilization
- Application Metrics
- Response times
- Error rates
- Queue lengths
- Memory usage
Example of automated response to metrics:
Our system sends notifications to Slack channels.
Each Lambda function has specific IAM roles with least-privilege access:
- Scheduled start/stop of development resources
- Capacity adjustment based on usage patterns
- Weekend and holiday scheduling
- ⏰ Shutdown of dev/stage/test EKS clusters during off-hours (~8-12 hours/day)
- 🛑 Stopping RDS instances for dev/stage environments
- 📉 Reducing EC2 instances count in ASG for dev/stage environments
- Automated database VACUUM operations
- Storage space monitoring
- Performance optimization
- Regular utilization analysis
- Automatic scaling adjustments
- Cost-effective resource allocation
- 40-60% reduction in development environment costs through:
- EKS clusters shutdown during off-hours
- RDS instances stoppage during non-working hours
- Reducing EC2 instances in ASG
- Elimination of idle resource costs during weekends
- Monthly savings of approximately $5000-7000 on dev/stage environments
- Zero downtime due to storage issues
- Proactive issue detection
- Automated maintenance procedures
- Reduced manual intervention
- Consistent resource management
- Automated incident response
Here's an example of our RDS maintenance implementation:
- Critical Metrics
- Database storage utilization
- Application error rates
- Resource utilization patterns
- Performance metrics
- Alert Thresholds
- Warning: 70% utilization
- Critical: 85% utilization
- Emergency: 95% utilization
- Response Actions
- Automated maintenance
- Resource scaling
- Team notifications
Best Practices
- Resource Tagging
- Monitoring Configuration
- Set appropriate thresholds based on historical data
- Implement graduated response actions
- Maintain comprehensive monitoring documentation
- Security Measures
- Use least-privilege IAM roles
- Implement proper error handling
- Maintain audit logs
Here's an example of our RDS maintenance implementation:
- Critical Metrics
- Database storage utilization
- Application error rates
- Resource utilization patterns
- Performance metrics
- Alert Thresholds
- Warning: 70% utilization
- Critical: 85% utilization
- Emergency: 95% utilization
- Response Actions
- Automated maintenance
- Resource scaling
- Team notifications
Our AWS Lambda-based scheduling and monitoring system has proven highly effective in:
- Reducing operational costs through automated resource management
- Preventing incidents through proactive monitoring
- Improving system reliability through automated maintenance
- Reducing team workload through automation
The combination of scheduled resource management and proactive monitoring ensures optimal resource utilization while maintaining system stability and performance.
- 📚 Github repository - Complete implementation of AWS Lambda schedulers for cost optimization