
A deep dive into unused resource detection in AWS for cost optimization.
This article walks through an advanced Python-based tool built to identify unused AWS resources, improve cost visibility, and automate FinOps practices.
Published Apr 23, 2025
It just so happened that I was writing an article about the operational day-to-day activities of a FinOps engineer — focusing on which things are important to monitor daily, which tools assist in doing so, and how to identify "unused" or "abandoned" resources. And now, while closing down several projects, I had the chance to once again validate the effectiveness of the approach and tool I had chosen.
I'm sharing my personal approach, specifically in the context of AWS Cloud, though I believe many face similar situations in other clouds. Every cloud provider — including AWS — offers many native tools, and while they're certainly useful, they each come with limitations and drawbacks. Sooner or later, an engineer finds themselves needing a custom tool tailored to their specific context.
AWS makes it easy to provision new resources, but due to human factors and immature processes, these resources often go unnoticed once a project ends — leading to unnecessary costs.
The tool described in the article helps detect resources that are either unused or underutilized. This allows teams to reduce cloud waste and improve overall infrastructure governance and cost-efficiency.
Cloud-native applications tend to grow rapidly — and so do their costs. One major contributor to unnecessary AWS spend is abandoned or underutilized resources: EC2 instances left running, unattached volumes, idle Lambda functions, and more.
This article presents an approach to building a custom solution for identifying and inventorying unused (abandoned) resources in your AWS account to achieve cost optimization, and automate FinOps practices.
🧰 The article provides both a conceptual approach and partial implementation of the scanner, written in Python usingboto3
,CloudWatch
,STS
,colorama
, and structuredJSON
logging.

This article focuses on the core mechanism for detecting abandoned or idle cloud resources. The broader infrastructure is only partially presented.
If you're interested in the full implementation, feel free to leave a comment — I'll provide a more complete overview and share the full source code.
While AWS provides several built-in tools like Trusted Advisor, Compute Optimizer, and Cost Explorer, they often:
- Require a premium support plan
- Are limited in scope (e.g., only EC2, EBS, or CPU-related metrics)
- Do not offer automation, JSON reports, or Slack/email alerts
This custom scanner fills those gaps.
✅ Multi-region scanning per service
✅ Service support: EC2, RDS, Lambda, S3, IAM, ACM, DynamoDB, VPC, EBS, ELB, ECR, and more
✅ CloudWatch metrics evaluation (e.g., CPU, connections, invocations)
✅ JSON output report with timestamp and account ID
✅ Execution time tracking per function
✅ Slack + SES email notification support
✅ S3 export for FinOps dashboards
✅ EventBridge-compatible for scheduling
✅ Language support: English 🇺🇸 + Ukrainian 🇺🇦
✅ Service support: EC2, RDS, Lambda, S3, IAM, ACM, DynamoDB, VPC, EBS, ELB, ECR, and more
✅ CloudWatch metrics evaluation (e.g., CPU, connections, invocations)
✅ JSON output report with timestamp and account ID
✅ Execution time tracking per function
✅ Slack + SES email notification support
✅ S3 export for FinOps dashboards
✅ EventBridge-compatible for scheduling
✅ Language support: English 🇺🇸 + Ukrainian 🇺🇦

- Detect AWS account ID
- Configure logging with timestamped files
- Parse language from environment variable (default: UA)
Each function is wrapped in a decorator to measure and log runtime.
Each major AWS service has a dedicated
find_*()
function:
At the end of a scan, the script:
- Saves the full results to
results/aws_waste_results_<account_id>_<timestamp>.json
- Includes execution time per function
- Logs a summary (number of items found, time taken)
You can easily feed this data into a dashboard:
- Upload to S3
- Use Glue Crawler to define schema
- Query with Athena
- Build reports in QuickSight or Grafana
Examples:
- 📊 Top 10 idle EC2 by cost
- 🗺️ Heatmap of resource sprawl per region
- 📉 Usage trend over time
- 📊 Top 10 idle EC2 by cost
- 🗺️ Heatmap of resource sprawl per region
- 📉 Usage trend over time
The scanner requires read-only + CloudWatch + S3 + SES permissions. A sample IAM policy includes:
ec2:Describe*
,rds:Describe*
,lambda:ListFunctions
cloudwatch:GetMetricStatistics
s3:PutObject
,ses:SendEmail
logs:*
(for Lambda logging)
This is the full IAM policy :

- Run as cli tool or Lambda (use included Dockerfile or zip package)
- Schedule via EventBridge
- Use Terraform for IAM setup
- Customize thresholds (CPU%, days, etc.)
- Using Dockerfile: You can create a container with this script and deploy it as a container-based Lambda function. This allows you to include all necessary dependencies and libraries without the size limitations of a regular Lambda package.
- ZIP package: Alternatively, the code can be packaged into a ZIP file with dependencies for standard Lambda deployment. This is faster to deploy but may be more challenging for dependency management.
- Timeout Configuration: Set Lambda timeout to 15 minutes for complete scanning of all services.
- Memory Configuration: It's recommended to configure at least 1024MB of memory for efficient scanner operation.
- Regular Execution: Configure an EventBridge rule for weekly or monthly scanner execution.
- Cron Expression: Use an expression like
cron(0 9 ? * MON *)
to run every Monday at 9 AM. - Parameterization: Pass different parameters for different runs (e.g., different AWS accounts via Lambda parameters).
- Notifications: Configure EventBridge to send results to SNS for notifications after scanning completes.
- IAM Module: Create a Terraform module that automatically configures all necessary IAM roles and policies.
- Multi-Account Support: Configure trust relationships to work with multiple AWS accounts.
- Principle of Least Privilege: Adapt the IAM policy shown in the article to grant only necessary permissions depending on the functionality you're using.
- Terraform Variables: Use variables for flexible configuration of S3 buckets, SES email addresses and other parameters.
- Configuration File: Create a configuration JSON file to store threshold values for different services.
- Environment Variables: Use environment variables to pass threshold values to the Lambda function.
- Example Values:
- CPU usage: 5% for detecting unused EC2 instances
- Time interval: 7 days for Lambda functions without invocations
- Snapshot age: 30 days for detecting old unclaimed snapshots
- S3 activity: 90 days without access for determining unused buckets
- Use resource tagging:
Owner
,Purpose
,TTL
- Review flagged resources before deletion
- Schedule weekly/monthly scans
- Visualize trends using S3+Athena dashboards
- Combine with native tools for full FinOps coverage
- Owner Tags: Always tag resources with the responsible person or team. This facilitates accountability and makes it easier to identify who to contact before taking action on unused resources.
- Format example:
Owner: team-name@company.com
orOwner: devops-team
- Purpose Tags: Add tags describing the resource's function or the project it belongs to, helping determine if it's still needed.
- Format example:
Project: customer-portal
orEnvironment: staging
- TTL (Time-to-Live) Tags: For temporary resources, set an expiration date to automate cleanup.
- Format example:
TTL: 2025-06-30
orExpiry: Q2-2025
- Automated Tag Enforcement: Use AWS Config Rules or Organizations Tag Policies to enforce tagging standards across all accounts.
- Verification Process: Establish a review workflow that includes verification steps before resource deletion.
- Quarantine Approach: Instead of immediate deletion, consider moving resources to a "quarantine" state (e.g., stopping instances instead of terminating).
- Notification Period: Send notifications to resource owners and allow a grace period (e.g., 14 days) before taking action.
- Change Management: Document all cleanup actions in your change management system for audit purposes.
- Resource Dependencies: Check for hidden dependencies before removing resources (e.g., EBS volumes that appear unused but are part of a snapshot lifecycle).
- Weekly Light Scans: Run lightweight scans weekly to identify obvious waste (e.g., stopped instances, unattached volumes).
- Monthly Deep Scans: Schedule comprehensive monthly scans that analyze CloudWatch metrics for usage patterns.
- Quarterly Audits: Perform quarterly comprehensive reviews including manual verification of large or critical resources.
- Report Distribution: Automatically distribute scan results to team leaders and financial stakeholders.
- Action Tracking: Maintain a tracker for identified waste and remediation actions to measure the program's effectiveness.
- S3 + Athena Architecture: Store scanner results in S3, use Glue crawlers to catalog the data, and query with Athena.
- Dashboard Metrics: Create dashboards showing:
- Resource waste by service type
- Cost savings opportunities by team or project
- Historical trend of resource utilization
- Top waste contributors
- Grafana Integration: Use custom Grafana dashboards with Athena data source for real-time visibility.
- Executive Reporting: Create simplified executive dashboards focusing on cost trends and savings realized.
- AWS Cost Explorer: Use alongside this scanner for billing-based analysis and reserved instance coverage.
- Trusted Advisor: Supplement scanner findings with Trusted Advisor for security and performance recommendations.
- AWS Compute Optimizer: Leverage for right-sizing recommendations to complement idle resource detection.
- AWS Budgets: Set up budget alerts to complement waste detection with overall spend monitoring.
- Cost Anomaly Detection: Enable AWS Cost Anomaly Detection for unexpected spending increases that might indicate resource sprawl.
- Real-time Alerting: Create a Slack bot that posts scanner findings directly to designated channels.
- Implementation Options:
- Use AWS Lambda with the Slack API to post messages when scanner results are published to S3
- Leverage AWS Chatbot for native Slack integration
- Alert Prioritization: Set thresholds for different urgency levels:
- High priority: Expensive resources with zero utilization (e.g., idle RDS instances)
- Medium priority: Resources with very low utilization
- Low priority: Weekly summary of all identified waste
- Interactive Buttons: Add interactive elements to Slack messages that allow users to:
- Mark resources as "needed" to prevent future alerts
- Schedule automatic shutdown/cleanup for a specific date
- Assign review tasks to team members
- Code Examples: The repository includes a Lambda function that processes scanner results and formats them for Slack notification.
- Automated Report Generation: Send periodic email digests summarizing detected waste and potential savings.
- Email Templates: Create HTML templates with:
- Executive summary at the top
- Cost savings metrics and graphs
- Detailed tables of unused resources by service type
- Resource owner information parsed from tags
- Customization Options:
- Configure different report types per recipient (technical vs. financial)
- Include AWS Cost Explorer links for deeper analysis
- Add calendar links to schedule review meetings
- Compliance Features: Include audit trails and documentation links to meet change management requirements.
- Delivery Options: Configure based on organizational preference:
- Daily digest for DevOps teams
- Weekly summary for team leads
- Monthly executive report for management
- Automated Deployment: Set up GitHub Actions workflows to deploy the scanner to Lambda whenever changes are committed.
- Pipeline Components:
- Unit tests for scanner functions
- Security scanning of dependencies
- Package creation (ZIP or container image)
- Deployment to multiple AWS environments
- Progressive Deployment: Implement a progressive deployment strategy:
- Deploy to development environment first
- Run validation tests
- Automatically progress to production if tests pass
- Version Control: Maintain version tagging for Lambda deployments to enable rollbacks.
- Pull Request Automation: Automatically test and validate PRs with simulated runs against test AWS accounts.
- Infrastructure as Code: Include terraform code for Lambda and required resources.
- Cross-Account Scanning: Create a Terraform module that enables deploying the scanner across multiple AWS accounts.
- Architecture Components:
- Central monitoring account for aggregating findings
- Cross-account IAM roles with minimal required permissions
- Resource share configurations for consolidated reporting
- Deployment Options:
- Standalone deployment per account
- Hub-and-spoke model with a central reporting account
- Integration with AWS Organizations for automatic enrollment of new accounts
- State Management: Implement remote state storage with proper locking mechanisms.
- Variable Customization: Allow environment-specific configurations through Terraform variables:
- Custom thresholds per account/environment
- Service exclusions for specialized accounts
- Integration points for existing notification systems
- Compliance Features: Built-in compliance checks and guardrails for security requirements.
This script goes beyond what AWS native tools offer, empowering cloud teams to:
- Improve cost visibility
- Take action on idle resources
- Integrate FinOps with engineering workflows
- Automate everything
The code provided in the article was written quite some time ago and has certain shortcomings. While the scanner code is functional, there are several optimization opportunities that could significantly improve its performance, scalability, and maintainability. This section outlines key areas for improvement with code examples. I’d be glad to hear your feedback and hope that my article will be useful to you.
The current implementation scans each region and service sequentially, which can lead to long execution times. Using parallel processing can dramatically improve performance:
Replace multiple individual
get_metric_statistics
calls with batch get_metric_data
requests:Ensure all API calls handle pagination correctly:
Refactor the code to use classes for better organization and reusability:
Implement rate limiting to avoid API throttling:
Implement caching to avoid duplicate API calls:
For large AWS environments, implement streaming processing to avoid memory issues:
Implementing these optimizations would significantly improve the scanner's performance, especially in large AWS environments with multiple regions and thousands of resources.