Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

AWS Logo
Menu

Introduction to Threat Detection and Management on AWS

Secure your cloud infrastructure by mastering threat detection & management on AWS. This beginner's guide introduces key AWS services, security concepts & a hands-on use case. Start building a robust security posture today.

Brandon Carroll
Amazon Employee
Published Apr 11, 2024
In today's cloud-centric world, securing your infrastructure is paramount. As organizations embrace the cloud, understanding threat detection and management becomes crucial. This article will provide an overview of why threat detection matters and introduce key AWS services and components involved in this process. If you're just learning how to protect your cloud infrastructure, having an understanding of Threat Detection and Management on AWS is a must. If the topic is all together new to you, have a look at this post where I lay out the basics. When it comes to AWS, there is a wide range of services and tools to help you detect and manage threats effectively. In this series of articles I will be introducing you to why threat detection is important, some of the key services and features you'll be working with on AWS, and a use case with working examples. Let's dive in.

Why Threat Detection and Management Matters

Cyber threats can have devastating consequences for businesses, ranging from data breaches and financial losses to damaging reputation and operational disruptions. By implementing effective threat detection and management strategies, you can proactively identify and mitigate potential risks, ensuring the security and integrity of their cloud environments. Early detection and rapid response to threats are key to minimizing the impact of cyber attacks and protecting sensitive data.

Key Components in the AWS Ecosystem

Before we get into the details, let's familiarize ourselves with some key services and components on the AWS that play a critical role in threat detection and management. Some of these services you should probably already be familiar with, but here is the list with a simple explaination.
  1. Virtual Private Cloud (VPC): A logically isolated section of the AWS Cloud where you can launch AWS resources in a virtual network that you define.
  2. Network ACLs (Network Access Control Lists): An optional layer of security for your VPC that acts as a stateless firewall for controlling inbound and outbound traffic at the subnet level.
  3. Security Groups: A virtual firewall that controls inbound and outbound traffic for your EC2 instances.
  4. AWS WAF (Web Application Firewall): A web application firewall that helps protect your web applications from common web exploits and bots.
  5. AWS Shield: A managed Distributed Denial of Service (DDoS) protection service that safeguards applications against DDoS attacks.
  6. Amazon GuardDuty: A threat detection service that continuously monitors for malicious activity and unauthorized behavior to protect your AWS accounts and workloads.
  7. Amazon Inspector: An automated security assessment service that helps improve the security and compliance of applications deployed on AWS.
  8. AWS Config: A service that enables you to assess, audit, and evaluate the configurations of your AWS resources.
  9. Amazon CloudWatch: A monitoring and observability service that provides data and actionable insights for AWS resources.
You're not necessarily going to see all of these discussed in this series of articles, but I want a baseline for you to work from and these are the most common services (in my opinion). With that, let's make sure we have the same basic understanding of some security terms.

Security Terms Explained

As we get more into threat detection and management, I'll assume you understand some key security terms. The terms are as follows:
  1. Threat: A potential cause of an unwanted incident that may result in harm to a system or organization.
  2. Vulnerability: A weakness or flaw in a system that can be exploited by a threat actor to gain unauthorized access or cause harm.
  3. Risk: The potential for a threat to exploit a vulnerability and cause harm to an asset or organization.
  4. Incident: An occurrence that violates an explicitly or implicitly defined security policy or security practice.
  5. Intrusion Detection System (IDS): A system that monitors network traffic and system activities for malicious behavior or policy violations.
  6. Intrusion Prevention System (IPS): A system that not only detects potential threats but also takes action to prevent or mitigate the detected events.
  7. Firewall: A network security system that monitors and controls incoming and outgoing network traffic based on predetermined security rules.
  8. Denial of Service (DoS) Attack: An attack that aims to make a system or network resource unavailable to its intended users by overwhelming it with traffic or requests.

Use Case: Detecting and Managing Web Application Threats

To give you a sense of Threat Detection and Management on AWS, let's consider a practical use case where we need to detect and manage threats targeting a web application hosted on AWS. We'll leverage various AWS services and security best practices to achieve this goal. I'm going to use python to deploy these features because:
  1. I like Python and I can always use the practice.
  2. You're likely going to see and use python in cybersecurity work, AWS work, and so on.
The following use case will help you understand how to implement threat detection and management on AWS by setting up various security measures and monitoring tools. By following this example, you will accomplish the following:
  1. Secure Network Configuration: You will create a secure Virtual Private Cloud (VPC) environment with appropriate network access controls, including subnets, network ACLs, and security groups. This lays the foundation for a secure network infrastructure.
  2. Web Application Protection: You will deploy AWS Web Application Firewall (WAF) and AWS Shield to protect our web application from common web exploits like SQL injection, cross-site scripting (XSS) attacks, and distributed denial-of-service (DDoS) attacks.
  3. Threat Detection and Vulnerability Assessment: You will enable Amazon GuardDuty to monitor our AWS environment for potential threats and malicious activities. Additionally, we will run Amazon Inspector assessments to identify potential vulnerabilities or deviations from best practices in our web application.
  4. Monitoring and Response: You will set up Amazon CloudWatch alarms to receive notifications when Amazon GuardDuty detects potential threats. Furthermore, we will configure AWS Config to monitor for changes in security group configurations, ensuring compliance with our security policies.
By the end of this series, you will have implemented a comprehensive threat detection and management solution on AWS. This includes securing the network infrastructure, protecting the web application from common attacks, continuously monitoring for threats and vulnerabilities, and configuring automated responses and notifications for detected threats. This hands-on approach will provide a practical understanding of how to leverage AWS services to enhance the security posture of our cloud environment. In this article we will build the base architecture. To do this, I've provided a Python script. If Python is new to you, I suggest this Coursera course to get you started. I think it's important to start introducing you to Python early since you're likely to see it often when working in Security, the Cloud, and now Generative AI.

Baseline Secure VPC and Network Configuration

To begin with we are going to create a base architecture to implement threat detection services on. There are many ways you can go about doing this, but for this base configuration I am going to use Python. For base configurations I often use Python, Terraform, or CloudFormation so that I can easily repeat the builds with letter effort. It's also nice to have a good starting point so we can focus on the security features rather than building an architecture every time we want to test something. So, here's an example of the Python code using the AWS SDK for Python (Boto3) to create the following resources:
  • a VPC
  • 2 public and 2 private subnets
  • Corresponding route tables
  • an Internet Gateway for the public subnets
  • an Application Load Balancer
  • two EC2 instance with Apache running on them
  • A Certificate to use with the ALB
  • Network ACLs
  • Security groups
  • EC2 instance connect endpoints to access the shell of the instances that are in private subnets
I've added comments in the code below to give you an idea of what each part does, but I will not be explaining it in detail in this post.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318

[import boto3



# Create a VPC

ec2 = boto3.resource('ec2')

vpc = ec2.create_vpc(CidrBlock='10.0.0.0/16')



# Create public and private subnets

subnet_public = vpc.create_subnet(CidrBlock='10.0.0.0/24', AvailabilityZone='us-east-1a')

subnet_private = vpc.create_subnet(CidrBlock='10.0.1.0/24', AvailabilityZone='us-east-1b')



# Create network ACLs and security groups

acl_public = vpc.create_network_acl()

acl_private = vpc.create_network_acl()

sg_web = vpc.create_security_group(GroupName='WebServerSG', Description='Allow HTTP/HTTPS')

sg_web.authorize_ingress(IpProtocol='tcp', CidrIp='0.0.0.0/0', FromPort=80, ToPort=80)

sg_web.authorize_ingress(IpProtocol='tcp', CidrIp='0.0.0.0/0', FromPort=443, ToPort=443)](<import boto3
from base64 import b64encode
import time

# Create AWS clients
ec2_resource = boto3.resource("ec2")
ec2_client = boto3.client("ec2")
elbv2 = boto3.client('elbv2')
acm = boto3.client('acm')

print("Creating VPC")
vpc = ec2_resource.create_vpc(CidrBlock="172.17.0.0/16")
vpc.create_tags(Tags=[{"Key": "Name", "Value": "my_threat-detection_vpc"}])
vpc.wait_until_available()
print(f"VPC created: {vpc.id}")

print("Creating Internet Gateway")
ig = ec2_resource.create_internet_gateway()
vpc.attach_internet_gateway(InternetGatewayId=ig.id)
print(f"Internet Gateway created: {ig.id}")

print("Creating Route Table")
route_table = vpc.create_route_table()
route = route_table.create_route(DestinationCidrBlock="0.0.0.0/0", GatewayId=ig.id)
print(f"Route Table created: {route_table.id}")

print("Creating Public Subnet 1")
subnet_public1 = vpc.create_subnet(CidrBlock="172.17.1.0/24", AvailabilityZone="us-east-1a")
print(f"Public Subnet 1 created: {subnet_public1.id}")

print("Creating Public Subnet 2")
subnet_public2 = vpc.create_subnet(CidrBlock="172.17.2.0/24", AvailabilityZone="us-east-1b")
print(f"Public Subnet 2 created: {subnet_public2.id}")

print("Creating Private Subnet 1")
subnet_private1 = vpc.create_subnet(CidrBlock="172.17.3.0/24", AvailabilityZone="us-east-1a")
print(f"Private Subnet 1 created: {subnet_private1.id}")

print("Creating Private Subnet 2")
subnet_private2 = vpc.create_subnet(CidrBlock="172.17.5.0/24", AvailabilityZone="us-east-1b")
print(f"Private Subnet 2 created: {subnet_private2.id}")

print("Creating Private Route Table")
route_table_private = vpc.create_route_table()
print(f"Private Route Table created: {route_table_private.id}")

route_table_private.associate_with_subnet(SubnetId=subnet_private1.id)
route_table_private.associate_with_subnet(SubnetId=subnet_private2.id)

route_table.associate_with_subnet(SubnetId=subnet_public1.id)
route_table.associate_with_subnet(SubnetId=subnet_public2.id)

print("Creating Network ACLs")
acl_public = vpc.create_network_acl()
acl_private = vpc.create_network_acl()
print(f"Public Network ACL created: {acl_public.id}")
print(f"Private Network ACL created: {acl_private.id}")

print("Creating Security Group")
sg_web = vpc.create_security_group(GroupName="WebServerSG", Description="Allow HTTP/HTTPS")
sg_web.authorize_ingress(IpProtocol="tcp", CidrIp="0.0.0.0/0", FromPort=80, ToPort=80)
sg_web.authorize_ingress(IpProtocol="tcp", CidrIp="0.0.0.0/0", FromPort=443, ToPort=443)
print(f"Security Group created: {sg_web.id}")

print("Creating Application Load Balancer")
load_balancer = elbv2.create_load_balancer(
Name='MyWebAppLoadBalancer',
Subnets=[subnet_public1.id, subnet_public2.id],
SecurityGroups=[sg_web.id],
Scheme='internet-facing',
Type='application'
)
print(f"Load Balancer created: {load_balancer['LoadBalancers'][0]['LoadBalancerArn']}")

# Defines the user data script
user_data_script = """#!/bin/bash
yum update -y
yum install -y httpd aws-ec2-instance-connect-plugin
systemctl enable httpd
systemctl start httpd
"""


# Encode the user data script as base64
user_data = b64encode(user_data_script.encode('utf-8')).decode('utf-8')

print("Launching EC2 instances")
instances = ec2_resource.create_instances(
ImageId='ami-051f8a213df8bc089',
MinCount=2,
MaxCount=2,
InstanceType='t2.micro',
KeyName='my-demo-environment',
UserData=user_data,
NetworkInterfaces=[
{
'AssociatePublicIpAddress': False,
'DeviceIndex': 0,
'SubnetId': subnet_private1.id,
'Groups': [sg_web.id]
}
]
)

# Function to check the instance state
def check_instance_state(instance_ids, desired_state):
instances = ec2_resource.instances.filter(InstanceIds=instance_ids)
for instance in instances:
instance.wait_until_running()
instance_state = instance.state['Name']
if instance_state != desired_state:
return False
return True

# Wait for the instances to be running
instance_ids = [instance.id for instance in instances]
waiter = ec2_client.get_waiter('instance_running')
start_time = time.time() # Get the current time

try:
waiter.wait(
InstanceIds=instance_ids,
Filters=[
{
'Name': 'instance-state-name',
'Values': ['running']
}
],
WaiterConfig={
'MaxAttempts': 20, # Check 20 times before giving up
'Delay': 15 # Wait 15 seconds between each attempt
}
)
except:
print("Instances did not reach the running state after 5 minutes.")

# Check if the instances are running or if the timeout was reached
if all(instance.state['Name'] == 'running' for instance in ec2_resource.instances.filter(InstanceIds=instance_ids)):
print("EC2 instances are running.")
else:
elapsed_time = time.time() - start_time
if elapsed_time %3E= 300: # 5 minutes * 60 seconds
print("Timeout reached. Continuing with the script.")
else:
print("Instances did not reach the running state within the expected time.")
# Optionally, you can raise an exception or exit the script here

# Wait for the instances to be in the running state
while not check_instance_state(instance_ids, 'running'):
time.sleep(10)

print("Creating Target Group")
target_group_response = elbv2.create_target_group(
Name='MyWebAppTargetGroup',
Protocol='HTTP',
Port=80,
VpcId=vpc.id,
HealthCheckPath='/',
TargetType='instance'
)
target_group = target_group_response['TargetGroups'][0]
print(f"Target Group created: {target_group['TargetGroupArn']}")

# Register instances as targets
targets = []
for instance in instances:
targets.append({'Id': instance.id, 'Port': 80})

# Use the elbv2 client to register targets
register_targets_response = elbv2.register_targets(
TargetGroupArn=target_group['TargetGroupArn'], # Access TargetGroupArn directly
Targets=targets
)

# Request an SSL/TLS certificate from ACM
domain_name = 'example.brandonjcarroll.com'
cert_arn = None

print(f"Checking if certificate for {domain_name} already exists")
certificate_list = acm.list_certificates()['CertificateSummaryList']
for certificate in certificate_list:
if certificate['DomainName'] == domain_name:
cert_arn = certificate['CertificateArn']
print(f"Certificate already exists: {cert_arn}")
break

if not cert_arn:
print(f"Certificate for {domain_name} does not exist. Requesting a new certificate.")
cert_arn = acm.request_certificate(
DomainName=domain_name,
ValidationMethod='DNS'
)['CertificateArn']
print("Please go to the AWS ACM console and complete the CNAME validation to issue the certificate.")

# Wait for the certificate to be issued
cert_status = acm.describe_certificate(CertificateArn=cert_arn)['Certificate']['Status']
while cert_status != 'ISSUED':
time.sleep(10)
cert_status = acm.describe_certificate(CertificateArn=cert_arn)['Certificate']['Status']
print(f"Certificate issued: {cert_arn}")

print("Creating HTTPS Listener")
https_listener = elbv2.create_listener(
LoadBalancerArn=load_balancer['LoadBalancers'][0]['LoadBalancerArn'],
Protocol='HTTPS',
Port=443,
Certificates=[
{
'CertificateArn': cert_arn
}
],
DefaultActions=[
{
'Type': 'forward',
'TargetGroupArn': target_group['TargetGroupArn']
}
]
)
print(f"HTTPS Listener created: {https_listener['Listeners'][0]['ListenerArn']}")

print("Creating HTTP Listener")
http_listener = elbv2.create_listener(
LoadBalancerArn=load_balancer['LoadBalancers'][0]['LoadBalancerArn'],
Protocol='HTTP',
Port=80,
DefaultActions=[
{
'Type': 'redirect',
'RedirectConfig': {
'Protocol': 'HTTPS',
'Port': '443',
'Host': '#{host}',
'Path': '/#{path}',
'Query': '#{query}',
'StatusCode': 'HTTP_301'
}
}
]
)
print(f"HTTP Listener created: {http_listener['Listeners'][0]['ListenerArn']}")

# Create EC2 Instance Connect endpoints for the instances
print("Creating EC2 Instance Connect endpoints")
for instance in instances:
response = ec2_client.create_instance_connect_endpoint(
SubnetId=subnet_private1.id, # Provide the SubnetId instead of InstanceId
PreserveClientIp=True, # Set this to True to allow connections from any IP address
DryRun=False
)
endpoint_id = response['InstanceConnectEndpoint']['InstanceConnectEndpointId'] # Access the key 'InstanceConnectEndpoint'
print(f"EC2 Instance Connect endpoint created for instance {instance.id}: {response['InstanceConnectEndpoint']}")

# Wait for the EC2 Instance Connect Endpoint to be available
print("Waiting for EC2 Instance Connect endpoint to be available...")
while True:
response = ec2_client.describe_instance_connect_endpoints(
InstanceConnectEndpointIds=[endpoint_id]
)
endpoint_state = response['InstanceConnectEndpoints'][0]['State']
if endpoint_state == 'available':
print(f"EC2 Instance Connect endpoint {endpoint_id} is available.")
break
elif endpoint_state == 'failed':
print(f"EC2 Instance Connect endpoint {endpoint_id} creation failed.")
break
time.sleep(10)

resources = {
"VPC": vpc.id,
"Internet Gateway": ig.id,
"Public Route Table": route_table.id,
"Private Route Table": route_table_private.id,
"Public Subnet 1": subnet_public1.id,
"Public Subnet 2": subnet_public2.id,
"Private Subnet 1": subnet_private1.id,
"Private Subnet 2": subnet_private2.id,
"Public Network ACL": acl_public.id,
"Private Network ACL": acl_private.id,
"Security Group": sg_web.id,
"Load Balancer": load_balancer["LoadBalancers"][0]["LoadBalancerArn"],
"Target Group": target_group["TargetGroupArn"],
"HTTPS Listener": https_listener["Listeners"][0]["ListenerArn"],
"HTTP Listener": http_listener["Listeners"][0]["ListenerArn"]
}

print("\nYour resources have been created as follows:")
for resource, resource_id in resources.items():
print(f"{resource}: {resource_id}")>)
To delete the resources created by the above script you will need to delete the load balancer, target group, EC2 instances, NAT Gateway, EC2 instance connect endpoint, VPC, and possibly the certificate if you generated one.
To deploy this architecture you can run it locally. The only element missing here is the credentials to the AWS account. See the AWS Boto3 documentation for an example of how to handle your credentials. Once you've deployed this part of the code you will have architecture that looks like what is shown in figure 1.
Figure 1
Figure 1
As you can see, we've built the baseline for implementing our threat detection capabilities. You can test the functionality of the environment by browsing to the URL of the load balancer. Currently the load balancer allows HTTPS traffic to the two EC2 instances in our private subnet so you may need to trust the certificate. The EC2 instances can be access via the EC2 instance connect endpoint and they have outbound connectivity through the NAT Gateway in the public subnet. This is our base architecture.
Now with our base architecture we can move to the next article in this series, Securing Your Web Application with AWS WAF and AWS Shield.
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments

Log in to comment