EKS Abnormal Notification Received After Deleting Subnet

When you perform an operation that affects EKS cluster health, you will receive a notification from AWS if it causes EKS health issues.

Published Jun 15, 2024

Background

One evening, I received the following email from AWS:
Title: [Action required] Resolve Amazon EKS cluster health issues
Content: The following is a list of affected clusters with their cluster arns, cluster health status, and corresponding cluster health issues. The health of an EKS cluster is a shared responsibility between AWS and customers. You must resolve these issues to maintain operational stability for your EKS cluster.
At the time, I skimmed through the email and thought that there might be a problem with the EKS infrastructure. However, the next day, I took a closer look and saw that there was an abnormality with EKS health. When I checked the health of the affected cluster, I found that the associated subnet had been deleted.
EKS Health Issue

Cause

The deletion of the subnet itself was a normal operation, as it was an intentional one. However, we were not aware that it was being used by EKS. Fortunately, this EKS cluster was a test environment, not a production environment, so it did not cause any problems.

Notification Timeline

We reviewed the timeline between the subnet deletion and the notification:
16:52 Subnet deleted
17:01 EKS cluster abnormality notification
The notification arrived within 9 minutes. The official documentation states that it can take up to 3 hours, so we cannot say for sure that the notification will always arrive within this timeframe.
https://docs.aws.amazon.com/eks/latest/userguide/troubleshooting.html#cluster-health-status
However, even if a similar problem occurred in a production environment, the notification could provide a clue to the problem. (Of course, the application itself would likely detect the problem before then.)

Conclusion

This story illustrates that you may receive a notification from AWS if you perform an operation that affects EKS cluster health. I was impressed by AWS's attention to detail and the mechanisms in place to enhance service reliability.
 

Comments