How to Build and Manage a Resilient Service Using Health Checks, Decoupled Dependencies, and Load Balancing using AWS SDKs
Did you know you can deploy and manage a load-balanced, resilient web service entirely with AWS SDKs?
- Amazon EC2 Auto Scaling is used to create Amazon Elastic Compute Cloud (Amazon EC2) instances based on a launch template. The Auto Scaling group ensures that the number of instances is kept in a specified range.
- Elastic Load Balancing handles HTTP requests, monitors the health of instances in the Auto Scaling group, and distributes requests to healthy instances.
- A Python web server runs on each instance to handle HTTP requests. It responds with recommendations and health checks and takes different actions depending on a set of AWS Systems Manager parameters that simulate failures and demonstrate improved resiliency.
- An Amazon DynamoDB table simulates a recommendation service that the web server depends on to get recommendations.
- The DynamoDB table that is used as a recommendation service. The table is populated with a few initial values.
- An AWS Identity and Access Management (IAM) policy, role, and instance profile that grants permission to each Amazon EC2 instance so that it can access the DynamoDB recommendations table and Systems Manager parameters.
- An Amazon EC2 launch template that specifies how instances are started. The launch template includes a startup Bash script that installs Python packages and starts a Python web server.
- An Auto Scaling group that is configured to ensure that you have three running instances in three Availability Zones.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Creating and populating a DynamoDB table named 'doc-example-recommendation-service'.
INFO: Creating table doc-example-recommendation-service...
INFO: Table doc-example-recommendation-service created.
INFO: Populated table doc-example-recommendation-service with items from ../../../workflows/resilient_service/resources/recommendations.json.
----------------------------------------------------------------------------------------
Creating an EC2 launch template that runs '../../../workflows/resilient_service/resources/server_startup_script.sh' when an instance starts.
This script starts a Python web server defined in the `server.py` script. The web server
listens to HTTP requests on port 80 and responds to requests to '/' and to '/healthcheck'.
For demo purposes, this server is run as the root user. In production, the best practice is to
run a web server, such as Apache, with least-privileged credentials.
The template also defines an IAM policy that each instance uses to assume a role that grants
permissions to access the DynamoDB recommendation table and Systems Manager parameters
that control the flow of the demo.
INFO: Created policy with ARN arn:aws:iam::123456789012:policy/doc-example-resilience-pol.
INFO: Created role doc-example-resilience-role and attached policy arn:aws:iam::123456789012:policy/doc-example-resilience-pol.
INFO: Created profile doc-example-resilience-prof and added role doc-example-resilience-role.
INFO: Created launch template doc-example-resilience-template for AMI ami-04288abc8d2000768 on t3.micro.
----------------------------------------------------------------------------------------
Creating an EC2 Auto Scaling group that maintains three EC2 instances, each in a different
Availability Zone.
INFO: Created EC2 Auto Scaling group doc-example-resilience-template with availability zones ['us-west-2a', 'us-west-2b', 'us-west-2c', 'us-west-2d'].
----------------------------------------------------------------------------------------
At this point, you have EC2 instances created. Once each instance starts, it listens for
HTTP requests. You can see these instances in the console or continue with the demo.
----------------------------------------------------------------------------------------
- An ELB target group that is attached to the Auto Scaling group. The target group forwards HTTP requests to instances in the Auto Scaling group on port 80, and is configured to verify the health of instances. To speed up this demo, the health check is configured with shortened times and lower thresholds. In production, you might want to decrease the sensitivity of your health checks to avoid unwanted failures.
- An Application Load Balancer that provides a single endpoint for your users, and a listener that the load balancer uses to distribute requests to the underlying instances.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Creating an Elastic Load Balancing target group and load balancer. The target group
defines how the load balancer connects to instances. The load balancer provides a
single endpoint where clients connect and dispatches requests to instances in the group.
INFO: Found 4 subnets for the specified zones.
INFO: Created load balancing target group doc-example-resilience-tg.
INFO: Created load balancer doc-example-resilience-lb.
INFO: Waiting for load balancer to be available...
INFO: Load balancer is available!
INFO: Created listener to forward traffic from load balancer doc-example-resilience-lb to target group doc-example-resilience-tg.
INFO: Attached load balancer target group doc-example-resilience-tg to auto scaling group doc-example-resilience-group.
Your load balancer is ready. You can access it by browsing to:
http://doc-example-resilience-lb-1317068782.us-west-2.elb.amazonaws.com
----------------------------------------------------------------------------------------
1
2
3
4
5
6
7
8
9
10
----------------------------------------------------------------------------------------
See the current state of the service by selecting one of the following choices:
1. Send a GET request to the load balancer endpoint.
2. Check the health of load balancer targets.
3. Go to the next part of the demo.
Which action would you like to take?
----------------------------------------------------------------------------------------
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
----------------------------------------------------------------------------------------
Request:
GET http://doc-example-resilience-lb-1317068782.us-west-2.elb.amazonaws.com
Response:
200
{'Title': {'S': 'Pride and Prejudice'},
'Creator': {'S': 'Jane Austen'},
'MediaType': {'S': 'Book'},
'ItemId': {'N': '1'},
'Metadata': {'InstanceId': 'i-05387127cb2ebbea1',
'AvailabilityZone': 'us-west-2c'}}
----------------------------------------------------------------------------------------
1
2
3
4
5
6
7
8
----------------------------------------------------------------------------------------
Checking the health of load balancer targets:
Target i-02d98d9d0726c4b2d on port 80 is healthy
Target i-0e4b7104cfaf8e056 on port 80 is healthy
Target i-05387127cb2ebbea1 on port 80 is healthy
----------------------------------------------------------------------------------------
1
2
3
4
5
6
7
8
9
----------------------------------------------------------------------------------------
Request:
GET http://doc-example-resilience-lb-1317068782.us-west-2.elb.amazonaws.com
Response:
502
----------------------------------------------------------------------------------------
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Request:
GET http://doc-example-resilience-lb-1317068782.us-west-2.elb.amazonaws.com
Response:
200
{'MediaType': {'S': 'Book'},
'ItemId': {'N': '0'},
'Title': {'S': '404 Not Found: A Coloring Book'},
'Creator': {'S': 'The Oatmeal'},
'Metadata': {'InstanceId': 'i-05387127cb2ebbea1',
'AvailabilityZone': 'us-west-2c'}}
----------------------------------------------------------------------------------------
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
----------------------------------------------------------------------------------------
Request:
GET http://doc-example-resilience-lb-1317068782.us-west-2.elb.amazonaws.com
Response:
200
{'Title': {'S': 'Delicatessen'},
'Creator': {'S': 'Jeunet et Caro'},
'MediaType': {'S': 'Movie'},
'ItemId': {'N': '1'},
'Metadata': {'InstanceId': 'i-02d98d9d0726c4b2d',
'AvailabilityZone': 'us-west-2a'}}
----------------------------------------------------------------------------------------
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
----------------------------------------------------------------------------------------
Request:
GET http://doc-example-resilience-lb-1317068782.us-west-2.elb.amazonaws.com
Response:
200
{'MediaType': {'S': 'Book'},
'ItemId': {'N': '0'},
'Title': {'S': '404 Not Found: A Coloring Book'},
'Creator': {'S': 'The Oatmeal'},
'Metadata': {'InstanceId': 'i-0e4b7104cfaf8e056',
'AvailabilityZone': 'us-west-2b'}}
----------------------------------------------------------------------------------------
1
2
3
4
5
6
7
8
9
10
11
----------------------------------------------------------------------------------------
Checking the health of load balancer targets:
Target i-02d98d9d0726c4b2d on port 80 is healthy
Target i-0e4b7104cfaf8e056 on port 80 is unhealthy
Target.ResponseCodeMismatch: Health checks failed with these codes: [503]
Target i-05387127cb2ebbea1 on port 80 is healthy
----------------------------------------------------------------------------------------
1
2
3
4
5
6
7
8
9
10
11
12
13
----------------------------------------------------------------------------------------
Checking the health of load balancer targets:
Target i-02d98d9d0726c4b2d on port 80 is healthy
Target i-05387127cb2ebbea1 on port 80 is healthy
Target i-0e4b7104cfaf8e056 on port 80 is draining
Target.DeregistrationInProgress: Target deregistration is in progress
Target i-0c8df865e77bbb943 on port 80 is unhealthy
Target.FailedHealthChecks: Health checks failed
----------------------------------------------------------------------------------------
1
2
3
4
5
6
7
8
9
10
11
12
13
14
----------------------------------------------------------------------------------------
Checking the health of load balancer targets:
Target i-02d98d9d0726c4b2d on port 80 is unhealthy
Target.ResponseCodeMismatch: Health checks failed with these codes: [503]
Target i-05387127cb2ebbea1 on port 80 is unhealthy
Target.ResponseCodeMismatch: Health checks failed with these codes: [503]
Target i-0c8df865e77bbb943 on port 80 is unhealthy
Target.ResponseCodeMismatch: Health checks failed with these codes: [503]
----------------------------------------------------------------------------------------
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
----------------------------------------------------------------------------------------
Request:
GET http://doc-example-resilience-lb-1317068782.us-west-2.elb.amazonaws.com
Response:
200
{'MediaType': {'S': 'Book'},
'ItemId': {'N': '0'},
'Title': {'S': '404 Not Found: A Coloring Book'},
'Creator': {'S': 'The Oatmeal'},
'Metadata': {'InstanceId': 'i-02d98d9d0726c4b2d',
'AvailabilityZone': 'us-west-2a'}}
----------------------------------------------------------------------------------------
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
----------------------------------------------------------------------------------------
This concludes the demo of how to build and manage a resilient service.
To keep things tidy and to avoid unwanted charges on your account, we can clean up all AWS resources
that were created for this demo.
Do you want to clean up all demo resources? (y/n) y
INFO: Deleted load balancer doc-example-resilience-lb.
INFO: Waiting for load balancer to be deleted...
INFO: Target group not yet released from load balancer, waiting...
INFO: Deleted load balancing target group doc-example-resilience-tg.
INFO: Stopping i-02d98d9d0726c4b2d.
INFO: Stopping i-05387127cb2ebbea1.
INFO: Stopping i-0c8df865e77bbb943.
INFO: Some instances are still running. Waiting for them to stop...
INFO: Some instances are still running. Waiting for them to stop...
INFO: Some instances are still running. Waiting for them to stop...
INFO: Some instances are still running. Waiting for them to stop...
INFO: Some instances are still running. Waiting for them to stop...
INFO: Some instances are still running. Waiting for them to stop...
INFO: Some instances are still running. Waiting for them to stop...
INFO: Deleted EC2 Auto Scaling group doc-example-resilience-group.
INFO: Deleted instance profile doc-example-resilience-prof.
INFO: Detached and deleted policy doc-example-resilience-pol.
INFO: Deleted role doc-example-resilience-role.
INFO: Launch template doc-example-resilience-template deleted.
INFO: Deleted instance profile doc-example-resilience-bc-prof.
INFO: Detached and deleted policy doc-example-resilience-bc-pol.
INFO: Detached and deleted policy AmazonSSMManagedInstanceCore.
INFO: Deleted role doc-example-resilience-bc-role.
INFO: Deleting table doc-example-recommendation-service...
INFO: Table doc-example-recommendation-service deleted.
----------------------------------------------------------------------------------------
- You used a load balancer to let your users target a single endpoint that automatically distributed traffic to web servers running in your target group.
- You used an Auto Scaling group so you could remove unhealthy instances and automatically keep the number of instances within a specified range.
- You decoupled your web server from its dependencies and returned a successful static response even when the underlying service failed.
- You implemented deep health checks to report unhealthy instances to the load balancer so that it dispatched requests only to instances that responded successfully.
- You used a load balancer to let the system fail open when something unexpected went wrong. Your users got a successful static response, buying you time to investigate the root cause and get the system running again.
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.