Analyze Cloudtrail logs with Amazon Bedrock batch inference

Introduction

In the ever-evolving landscape of cloud computing, securing and auditing cloud resources is paramount. AWS CloudTrail provides a comprehensive record of actions taken within your AWS environment, making it an invaluable tool for monitoring and securing cloud operations. However, the vast amount of data generated by CloudTrail can be overwhelming, making it difficult to identify potential security threats in real-time.

Imagine a small but rapidly growing startup that has recently launched a cloud-based service. With limited resources, the team is focused on maintaining the security and stability of their AWS infrastructure. However, as the company scales, the volume of CloudTrail logs starts to overwhelm their ability to monitor and analyze them effectively. The team knows that staying on top of security threats is crucial, but they’re already stretched thin, lacking the time and budget to implement large-scale, complex solutions. The startup considers several options. They could deploy open-source tools like ELK (Elasticsearch, Logstash, Kibana) for log analysis, but setting up and managing such a system requires significant technical expertise and ongoing maintenance—something their small team simply can’t afford. Another option might be subscribing to a managed security service, but the high costs are prohibitive, especially for a growing company with a tight budget. Custom scripts to parse and analyze logs are also considered, but it quickly becomes clear that this approach is too basic and won’t scale as the company continues to grow.

This is where Amazon Bedrock emerges as the potential solution. By leveraging Amazon Bedrock batch inference, the company can apply advanced AI models to analyze their CloudTrail logs without needing to invest in complex infrastructure or hire specialized personnel. This article and associated notebook are here to showcase how to leverage Amazon Bedrock for batch inference to analyze CloudTrail logs through the following steps: retrieve cloudtrail events, create a file for batch inference, launch the batch job, summarize the output and send the final summary to relevant users.

Architecture

Image not found

Architecture diagram

Overview

"CloudTrail Analysis with Amazon Bedrock Batch Inference" is structured to guide users through the process of setting up and executing batch inference on CloudTrail logs. Below are the key sections:

Prerequisites:
- Before diving into the analysis, ensure that you have the necessary permissions and access to the required resources, such as AWS credentials, access to the Claude 3 Haiku model in the us-east-1 region, and permissions to create the needed AWS resources: 1 S3 bucket, 1 IAM role, and 1 SNS topic.
Setup and Configuration:
- The notebook starts by installing boto3, and setting up the necessary AWS resources. This includes creating an S3 bucket to store intermediate data, defining an IAM role that grants Bedrock access to these resources and creating the SNS topic to send the final analysis to relevant users.
Processing CloudTrail Events:
- Using the lookup_events API, the notebook demonstrates how to retrieve CloudTrail events, format them appropriately, and prepare them for batch inference. The results of the inference are then processed and analyzed to detect any anomalies or security concerns.
Batch Inference with Amazon Bedrock:
- The core of the sample is centered around setting up a batch inference job using the Claude 3 Haiku model. This includes code snippets to create the batch inference jsonl input file, where each line represents a JSON object containing the modelInput. (see following section for more details)
Output and Analysis:
- Finally, results from the batch inference job are interpreted and summarized to provide insights into potential security threats detected in the CloudTrail logs.

Creating a batch inference job with Amazon Bedrock

In order to create a batch inference job, a jsonl file needs to be created. Each line contains the prompt for the needed completion. Here is an example :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
bedrock_batch_json = {
    "modelInput": {
        "anthropic_version": "bedrock-2023-05-31", 
        "max_tokens": max_tokens_in_summary,
        "temperature": 0.5,
        "top_p": 0.9,
        "stop_sequences": [],
        "messages": [ 
            { 
                "role": "user", 
                "content": [
                    {
                        "type": "text", 
                        "text": prompt 
                    } 
                ]
            }
        ]
    }
}

Once all the lines are added to the jsonl file, it can be uploaded to an S3 bucket and batch job can be created with the create_model_invocation_job function on the bedrock client:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
job_name = f"cloudtrail-summary-{int(time.time())}"

s3.upload_file(batch_inference_input_file, s3_bucket_name, batch_inference_input_file)
print(f"Uploaded {batch_inference_input_file} to {s3_input_uri}")
        
response = bedrock.create_model_invocation_job(
    modelId=model_id,
    roleArn=role_arn,
    jobName=job_name,
    inputDataConfig=({
        "s3InputDataConfig": {
            "s3Uri": s3_input_uri
        }
    }),
    outputDataConfig=({
        "s3OutputDataConfig": {
            "s3Uri": s3_output_uri
        }
    })
)

Then the job starts.

Image not found

Amazon Bedrock batch inference job

JobArn can be retrieved to check the progression of the batch job during processing:

1
job_arn = response.get('jobArn')

In order to check the status of the job, you can query the get_model_invocation_job function:

1
2
3
4
5
6
7
print(f"Waiting for job {job_arn} to complete")
response = bedrock.get_model_invocation_job(jobIdentifier=job_arn)
if response['status'] == 'Completed':
    print(f"Done")
    break
elif response['status'] == 'Failed':
    raise Exception(f"Batch inference job failed: {response['failureReason']}")

Batch inference job output data configuration outputDataConfig points to a folder in the S3 bucket where sub-folder job-id is created containing a .out file eg. input_file_name.jsonl.out with the completion results.

Image not found

Batch inference output

Each item processed by the batch inference job is a json object in the .out file containing modelInput and modelOutput. Completion can be retrieved by iterating over each json object and looking into the content:

1
summary = data['modelOutput']["content"][0]["text"]

NOTE: a manifest.json.out is also generated and includes statistics on the batch job. eg. input tokens, output tokens.

1
2
3
4
5
6
7
8
{
    "totalRecordCount":2000,
    "processedRecordCount":2000,
    "successRecordCount":2000,
    "errorRecordCount":0,
    "inputTokenCount":13044546,
    "outputTokenCount":752764
}

Finally, this sample generates a summary of all the batch items generated by Amazon Bedrock batch inference by leveraging direct API to bedrock invoke_model.

NOTE

The model chosen for this tutorial is Haiku. If you find that the job has expired after 24 hours, this is because of high demand for batch inference jobs for this model. You can try increasing the timeoutDurationInHours to be greater than the default 24 hours, or you can try a different model.

Benefits of using Amazon Bedrock for CloudTrail Analysis and potential next steps

This approach to analyzing CloudTrail logs can help identify security anomalies that might be missed by conventional rule-based systems. By leveraging AI, you can uncover patterns and potential threats that require deeper analysis. Also, by using the Claude 3 Haiku model, this sample strikes a balance between cost and quality, allowing users to perform sophisticated analysis without incurring prohibitive costs.

Est. Pricing for 20k Cloudtrail logs (~500 tokens) with Claude 3 Haiku on Amazon Bedrock in us-east-1 at time of publication:

Input tokens: 20k * 500 tokens, 2K * 500 token for final summary, prompt tokens : 12M tokens / $3

Output tokens: 2k * 500 tokens for batch inference, 20 request for final summary of 500 tokens: ~1M tokens / $1.25

Next steps could be:

filter out unwanted services or users to limit the scope of analysis and cost
break down the cloudtrail events by services or users to build smaller granularity summaries
implement period based activities to allow analysis of recent or historical events

Potential Use Cases Beyond CloudTrail Analysis

While this example focuses on CloudTrail logs, the principles and techniques demonstrated can be extended to other types of logs and data sources within AWS.

Here are a 2 potential use cases:

Application Logs: For applications running on AWS, logs can be ingested into Bedrock for batch processing, allowing you to analyze performance, detect errors, and even predict future issues based on historical data.
Operational Data: Beyond security, Bedrock batch inference can be used to analyze operational data, such as EC2 instance metrics or S3 access logs, to optimize resource usage and reduce costs.

Conclusion

This sample is a baseline for anyone looking to enhance their security posture by analyzing CloudTrail logs using Amazon Bedrock. By following the steps outlined in the notebook, you can quickly set up and execute batch inference jobs that help you identify potential security threats in your AWS environment. Furthermore, the techniques demonstrated here can be applied to a wide range of other AWS services and use cases, making it a versatile addition to your cloud security toolkit.
We encourage you to explore its capabilities, adapt it to your specific needs, and share your experiences with the community.

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Select your cookie preferences

Site Terms, Privacy, and more.

Analyze Cloudtrail logs with Amazon Bedrock batch inference

This sample is a baseline for anyone looking to enhance their security posture by analyzing CloudTrail logs. Reusable for other usecases too.

Introduction

Architecture

Overview

Creating a batch inference job with Amazon Bedrock

NOTE

Benefits of using Amazon Bedrock for CloudTrail Analysis and potential next steps

Potential Use Cases Beyond CloudTrail Analysis

Conclusion

Comments