
Analyze Cloudtrail logs with Amazon Bedrock batch inference
This sample is a baseline for anyone looking to enhance their security posture by analyzing CloudTrail logs. Reusable for other usecases too.
- Prerequisites:
- Before diving into the analysis, ensure that you have the necessary permissions and access to the required resources, such as AWS credentials, access to the Claude 3 Haiku model in the
us-east-1
region, and permissions to create the needed AWS resources: 1 S3 bucket, 1 IAM role, and 1 SNS topic.
- Setup and Configuration:
- The notebook starts by installing
boto3
, and setting up the necessary AWS resources. This includes creating an S3 bucket to store intermediate data, defining an IAM role that grants Bedrock access to these resources and creating the SNS topic to send the final analysis to relevant users.
- Processing CloudTrail Events:
- Using the
lookup_events
API, the notebook demonstrates how to retrieve CloudTrail events, format them appropriately, and prepare them for batch inference. The results of the inference are then processed and analyzed to detect any anomalies or security concerns.
- Batch Inference with Amazon Bedrock:
- The core of the sample is centered around setting up a batch inference job using the Claude 3 Haiku model. This includes code snippets to create the batch inference jsonl input file, where each line represents a JSON object containing the
modelInput
. (see following section for more details)
- Output and Analysis:
- Finally, results from the batch inference job are interpreted and summarized to provide insights into potential security threats detected in the CloudTrail logs.
jsonl
file needs to be created. Each line contains the prompt for the needed completion. Here is an example :1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
bedrock_batch_json = {
"modelInput": {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": max_tokens_in_summary,
"temperature": 0.5,
"top_p": 0.9,
"stop_sequences": [],
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": prompt
}
]
}
]
}
}
jsonl
file, it can be uploaded to an S3 bucket and batch job can be created with the create_model_invocation_job
function on the bedrock client:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
job_name = f"cloudtrail-summary-{int(time.time())}"
s3.upload_file(batch_inference_input_file, s3_bucket_name, batch_inference_input_file)
print(f"Uploaded {batch_inference_input_file} to {s3_input_uri}")
response = bedrock.create_model_invocation_job(
modelId=model_id,
roleArn=role_arn,
jobName=job_name,
inputDataConfig=({
"s3InputDataConfig": {
"s3Uri": s3_input_uri
}
}),
outputDataConfig=({
"s3OutputDataConfig": {
"s3Uri": s3_output_uri
}
})
)
JobArn
can be retrieved to check the progression of the batch job during processing:1
job_arn = response.get('jobArn')
get_model_invocation_job
function:1
2
3
4
5
6
7
print(f"Waiting for job {job_arn} to complete")
response = bedrock.get_model_invocation_job(jobIdentifier=job_arn)
if response['status'] == 'Completed':
print(f"Done")
break
elif response['status'] == 'Failed':
raise Exception(f"Batch inference job failed: {response['failureReason']}")
outputDataConfig
points to a folder in the S3 bucket where sub-folder job-id
is created containing a .out
file eg. input_file_name.jsonl.out
with the completion results..out
file containing modelInput
and modelOutput
. Completion can be retrieved by iterating over each json object and looking into the content:1
summary = data['modelOutput']["content"][0]["text"]
manifest.json.out
is also generated and includes statistics on the batch job. eg. input tokens, output tokens.1
2
3
4
5
6
7
8
{
"totalRecordCount":2000,
"processedRecordCount":2000,
"successRecordCount":2000,
"errorRecordCount":0,
"inputTokenCount":13044546,
"outputTokenCount":752764
}
invoke_model
.timeoutDurationInHours
to be greater than the default 24 hours, or you can try a different model.- filter out unwanted services or users to limit the scope of analysis and cost
- break down the cloudtrail events by services or users to build smaller granularity summaries
- implement period based activities to allow analysis of recent or historical events
- Application Logs: For applications running on AWS, logs can be ingested into Bedrock for batch processing, allowing you to analyze performance, detect errors, and even predict future issues based on historical data.
- Operational Data: Beyond security, Bedrock batch inference can be used to analyze operational data, such as EC2 instance metrics or S3 access logs, to optimize resource usage and reduce costs.
We encourage you to explore its capabilities, adapt it to your specific needs, and share your experiences with the community.
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.