Keeping an EYE on your system

Keeping an EYE on your system

A writeup summarizing the fourth session of the BeSA batch 5.

Published May 4, 2024
In this weeks session we look at Monitoring and Alerting and also a quick look at Pricing and Billing.
First lets understand WHY we monitor. We monitor the system to make sure our system is performing as per our business expectations as well as service expectations. While doing this we look at the resources we are using, their usage patterns and whether they are under or over utilized. We also look and the security of the system as a whole, for eg. if my resource has been hacked and is being used maliciously for bitcoin-mining racking up huge bills for me.
While monitoring the system we generally measure two metrics:
  • business metrics : these are defined by the Customer based on their expected business performance. Eg. On Amazon the seller cares about how many visitors had a look at their product, how many have it in their online basket, how many actually bought it, how many abandoned the search due to latency and poor performance etc.
  • system metrics : also called application or infrastructure metrics. These could be related to the servers, the DB or latency of the system.
AWS provides us services like CloudWatch to measure metrics, collect logs and set alerts. Metrics could be from on-premise or AWS infrastructure. You can monitor custom metrics (like the no. of users logged in) as well as the standard metrics (like CPU Utilization) with CloudWatch.
Cloudwatch Logs can be used to collect logs from your systems and various AWS services, which can be stored on S3( for longer term storage) . These logs can be used to monitor the system and trigger events via the AWS EventBridge service (formerly CloudWatch Events) You can also combine CloudWatch with ASG and ELB to scale out the servers in case CPUUtilization goes over a certain threshold.
To collect custom metrics from EC2 (on-prem or AWS) you can use Unified CloudWatch Agent . This is installed by default on AWS servers and needs to be configured to start monitoring custom metrics, for eg. Memory Utilization on EC2. CloudWatch Metric filter can be used to filter the logs for a particular metric pattern and then trigger CloudWatch Alarm based on this. With CloudWatch Logs Insights, you can interactively search and analyze your log data in CloudWatch Logs.
While Monitoring is the 'when and what' of a system error, and observability is the 'why and how'. Monitoring deals with collecting data and generating reports on different metrics whereas Observability looks at this data collected by monitoring to find the root cause of issues. There are three pillars of observability Logs, Metrics and Traces.
The service AWS XRay its an easy way for developers to “follow-the-thread” to trace requests from beginning to end in the system. XRay implements follow-the-thread tracing to create service graphs that visually depict the relationship of services to each other.
The service AWS CloudTrail is used to get a history of all API calls ( every action in AWS is an API call ) for your user account, this is useful for audit purposes. Cloudtrail stores this information for 90 days and you can use S3 to store it for longer . These logs can also be pushed to CloudWatch and you can use metric filters to find patterns for certain API calls . Now imagine a scenario where a hacker gains access to your system, performs certain malicious actions and then deletes all the traces of what API calls were made by him by modifying the Cloudtrail logs. To detect this you can enable Cloudtrail log file integrity validation to determine whether a log file was modified, deleted, or unchanged after CloudTrail delivered it. To secure the logs stored on S3 enable versioning and mfa-delete on the S3 bucket. And follow the least privilege practice to give access to the S3 bucket via IAM .
Now lets take a quick look at cost optimization when using AWS services. AWS has 3 basic principles of billing customers.
  1. Pay as you go : Pay for only the resources used.
  2. Save when you commit : there are various saving plans that help you pay less when you commit to using the resource for a fixed time eg 1 year or 3 years.
  3. Pay less by using more : Compute savings plans based on historical compute usage.
With most AWS services you generally pay for the Compute ( eg EC2 instance run time ), Storage ( eg. EBS volume attached to EC2 ) and Network Traffic ( eg .Data transferred out of EC2). Along with these some other additional parameters can be used to calculate the cost based on the service e.g in Lambda the number of function invocations matter.
To figure out how much it will cost you:
  • During the design/planning phase of your System you can use the AWS Pricing calculator to estimate the costs of your proposed system
  • And once the system is up and running the AWS Cost Explorer can help you view your costs and usage.
Some additional tips to understand your system costs better would be to:
  • Use Cost Allocation Tags to understand costs associated with different accounts or services. These can then help in having more detailed reports and better visuals in dashboards.
  • Billing and Cost management also shows cost bifurcations for your account.
  • AWS Budgets can help you set up alarms if your account costs are crossing a set threshold.
This was a veryuseful and information-dense session, you can watch it yourself to find good tips for the Cloud Practitioner exam and with this session our 4 weeks of CCP prep comes to and end.
Disclaimer/Clarification : These are just personal notes I have created summarizing the session I attended. All credit and thanks to the speakers and organizers , check out the website and Youtube links below.
BeSA is a volunteer run attempt to teach skills to become a Solutions Architect.
Watch it Live here.Signup for upcoming batches here.