Managing AWS Sprawl: Techniques to Clean Up Unused and Underutilized Cloud Assets

Maximizing Operational Efficiency in AWS: Identifying and Eliminating Stale Resources

Introduction

Operating in the cloud follows best practices such as Infrastructure as Code (IaC), continuous integration, and small iterative changes. While these principles enhance operational efficiency, one critical yet often overlooked Key Performance Indicator (KPI) is the ratio of actively used vs. stale resources.

As cloud operations scale, numerous resources become underutilized or entirely unused, leading to operational inefficiencies. These stale resources increase overhead for support teams, consume budget without delivering business value, and introduce security risks. While managed services like AWS Lambda and AWS Fargate are cost-efficient based on actual usage, many other AWS resources—such as EC2 instances, EBS volumes, AMIs, and networking components—may remain idle for extended periods, creating unnecessary management burdens.

In AWS, there are robust monitoring, security, and compliance tools, but no native, automated method to comprehensively identify stale resources across the environment. This blog post outlines best practices and strategies to efficiently manage AWS resources, ensuring an optimized cloud estate.

Strategies for Efficient Cloud Resource Management

A) Compute Resources

1. Identifying and Deleting Unused EC2 Instances or underutilized EC2 instances contribute significantly to operational inefficiencies by increasing costs and administrative efforts (e.g., patching, compliance checks).

To optimize EC2 usage:

Identify Unused Instances: Use AWS CloudTrail to filter by EventName=StopInstances and check for instances stopped beyond a defined threshold (e.g., 30 days).

Backup and Remove: Create a final backup, then terminate the instance.

2. Managing Unattached EBS Volumes

Unattached EBS volumes can accumulate over time, leading to unnecessary costs. This volumes might be created for testing , backup or additional disk that are in unattached state.

Use AWS Trusted Advisor and AWS Compute Optimizer to identify unattached or underutilized volumes.

Delete or Snapshot: Delete unused EBS volumes after taking necessary backups.

3. Cleaning Up Unused AMIs

Organizations often accumulate multiple AMIs due to the need of multiple versions or OS, different OS flavours or need or a highly hardened image, many of which become obsolete. efficient way like defining an approved version of OS list and make a standard set of AMIs managed from the central accounts etc. Due to the operational complexity, there can have multiple unused AMIs.

To identify unused AMIs , you should follow three step approach.

Step 1: Track Last Used AMIs: Use the last Launched Time attribute part of every AMIs to determine when an AMI was last used, you should validate this attribute with a standard timelines , say 60 days.

Step 2: Deprecate Unused AMIs: Mark old AMIs as deprecated to prevent them from being used for new launches, however these AMIs still in the list.

Step 3:- Deregister Stale AMIs: Once deprecated, deregister AMIs to remove them from your environment.

4. Optimizing AWS Snapshots

Snapshots provide resilience but, when unmanaged, lead to cost sprawl. Practically the manual snapshot taken during major upgrades or release are the most untouched stale snapshot seen in many cases.

Use AWS Data Lifecycle Manager (DLM) to automate retention policies for EBS snapshots during the configuration. The tab Retention Type (in the below screenshot) in each policy defines the duration of snapshots, DLM permanently deletes the snapshot once the threshold is passed.

Manual Snapshot:- You have to explore manual or scripted methods to identify age of a manual snapshot. You can Look at the Start Time column of the snapshot to find the timestamp when the snapshot was created

B. Storage Optimization

Amazon S3:- Since it is an object storage and auto scale, it tends to have millions of unused files in S3 buckets. These stale data also incur storage cost based on their tiers, replication etc. There are practically many instances of technical resource managing such garbage buckets.

1. Managing Amazon S3

1.a Organize with Prefixes and Folders: Establish structured naming conventions for logical hierarchy of files stored based on date, business process, project name or regional. Based on compliance, patterns and requirement, enable Life Cycle policies to manage them. You can filter them based on prefixes.

1.b Implement Lifecycle Policies: Automatically transition objects to cheaper storage tiers (Glacier, IA) or delete them after a set period. Setting 90 days transition and 180 days delete permanently are some of the used configuration in life cycle management.

In the above screenshot, permanently delete previous version and expire current versions of objects options delete file after the duration is expired.

2. Identifying Underutilized Amazon EFS

EFS provides scalable storage, but many organizations maintain unused file systems for testing or backups. Many EFS volumes would have never mounted or no active data transfers. Although EFS charges are based on the actual usage, operations teams spend time on managing security groups and keep highly vulnerable resource policy pointlessly.

Monitor CloudWatch Metrics: Use MeteredIOBytes, DataReadIOBytes, DataWriteIOBytes, and ClientConnections to identify underutilized volumes.

Unmount and Delete: Remove file systems with no active mounts or data transfers.

C) Identity and Access Management (IAM) Cleanup

While IAM resources don’t directly impact operational efficiency nor cost, unused IAM users, groups, roles, and policies pose security risks and provides attackers to reuse unused IAM credentials. It is key to have a well- disciplined practise to manage credentials efficiently.

Detect Unused IAM Users and Roles:

There are many methods available out of the box from AWS to identify stale credentials.

1.a) Use AWS Config Rule: iam-user-unused-credentials-check is an Aws Config rule to find users with inactive credentials. You can setup this rule with a age threshold for assessment and reporting.

1.b) Utilize IAM Access Analyzer to identify unused access permissions.

you can use IAM Access Analyzer unused access findings to identify unused access granted to IAM roles or users in your organization. In an larger AWS estate, you can delegate this responsibility to a central account (Mostly Security monitoring account ). you can use the dashboard to review unused access findings across your organization and prioritize the accounts .

Implementation guide: https://aws.amazon.com/blogs/security/iam-access-analyzer-simplifies-inspection-of-unused-access-in-your-organization/

Navigating through access report

The above screenshot of IAM Access Analyzer report gives details on the unused password, unused access keys assigned, roles and its permission boundaries

Remove or Disable: Deactivate stale IAM users and roles to reduce security vulnerabilities.

D) Network Resource Optimization

1. Identifying Unused Load Balancers:- While there is no easy way to identify unused Load balancers, you review the active instances of the target configuration. If the active instance is zero, it is an indication of no active end points. You can further automate this validation through Boto3 phyton script+ Lambda, triggered from scheduled Eventbridge rule. Load balancers incur charges whether used or not.

2. Identifying Unused VPN Connections:- AWS Site to Site VPN is billed for each hour that your VPN connection is in the "available" state , hence it is critical to know the usage and timely remove if unused. You can leverage AWS CloudWatch metrics, like TunnelDataIn and TunnelDataOut to identify unused VPN connections.

E) Identifying and Cleaning Up Unused Lambda Functions

To maintain a health lambda estate, it is essential to know the lambda execution and timely cleanup actions. There is no direct method or event ID that can be referred, however there are many solutions available for the same. One such solution is https://aws.amazon.com/blogs/mt/automating-the-discovery-of-unused-aws-lambda-functions/

Conclusion

Effective AWS operations require proactive management of cloud resources to eliminate inefficiencies and enhance cost savings. By leveraging AWS native tools and best practices for identifying and removing stale resources, organizations can maintain a lean, cost-effective, and secure cloud environment.

By implementing these strategies, businesses can reduce operational overhead, enhance security, and optimize cloud expenditures, ensuring their cloud infrastructure remains efficient and scalable.

Select your cookie preferences

Site Terms, Privacy, and more.

Managing AWS Sprawl: Techniques to Clean Up Unused and Underutilized Cloud Assets

Comments