Managing multiple AWS accounts can be tricky, especially when you want to keep everything secure and well-organized. That's where AWS Control Tower comes in - think of it as your helpful assistant that sets up a secure home base (called a landing zone) for all your AWS accounts. But just like a well-organized room can get messy over time, these settings can drift away from their original setup, which might create security risks.
In this blog post, we'll look at what this drift means, when it typically happens, and how you can fix it to keep your AWS environment running smoothly.
Problem Scenario
Control Tower Drift occurs when your AWS setup changes from how it was originally configured, usually because someone made manual changes or because of mistakes.
Here are two common examples:
Permission Policy Problems: AWS uses special rules (called SCPs) to control what different accounts can do. Sometimes these rules get changed or removed by accident. For example, when someone changed the rules for the "Security" OU, it caused problems with important features like logging and account monitoring.
Fig: AWS Control Tower Landing zone drift
Network Setup Issues: This happens when AWS Control Tower can't set up the basic network (VPC) in an account properly. It usually occurs when someone adds back a default network after AWS already set everything up. When this happens, you'll see an error message saying AWS can't manage the account's network, and this prevents you from making any updates to the account.
Let me share a recent experience I had with this exact issue. While trying to enroll a new account, I ran into the VPC baseline error. It turned out some EC2 instances and EBS volumes were still connected to the default VPC.
Solution Strategies
Resolving Control Tower Drift requires specific actions for each scenario, as detailed below:
For SCP-Related Drift:
Access the AWS Control Tower dashboard to identify drift alerts, which may specify SCP modifications. The second attachment's Reset button and documentation suggest using the "Reset" option on the drift alert page or in Landing zone settings to restore configurations. This repairs most types of drift, including SCP modifications, by reverting to the intended state.
Verify resolution by checking for remaining alerts. If unresolved, re-register the affected OU under Organizational units, as suggested by the documentation, to reapply governance controls. This ensures shared accounts like log archives and audit accounts function correctly.
For VPC Baseline Issues:
Navigate to the Amazon VPC console, identify the default VPC (typically with CIDR like 172.31.0.0/16), and delete it, ensuring no dependent resources exist, as per the troubleshooting table. The first attachment's error and the documentation highlight this as critical to resolving the Tainted state.
Update the account in AWS Control Tower by going to Accounts and selecting Update, allowing Control Tower to baseline the VPC correctly. Verify the account is no longer tainted and managed properly, reducing drift risks.
Deleting the Control Tower VPC and configuring the Account Factory for no VPC, but it was less directly relevant to drift resolution, focusing on setup rather than remediation.
Optionally, to clean up the AWS Control Tower VPC resource, aws-controltower-VPC, from an existing account, you can remove the stack instance from the AWS CloudFormation StackSet AWSControlTowerBP-VPC-ACCOUNT-FACTORY-V1, after you make sure that there are no existing resources or resource dependencies in the VPC as below:
To prevent drift in future, follow these practices:
Regularly check Control Tower dashboard and set up automated alerts for changes
Enable preventive controls to block unauthorized modifications
Use only Control Tower's console/APIs and maintain clear documentation
Conclusion
To wrap up, we've learned two main ways to fix Control Tower Drift problems in AWS. First, you can reset SCP settings when your security rules get changed. Second, you can delete any extra VPCs that might be causing issues. We've explained these solutions with real examples and included links to AWS's help guides for more details.
This blog post made these technical concepts easier to understand by breaking them down into clear sections: what the problem is, when it happens, and how to fix it.