Analyze Cloudtrail logs with Amazon Bedrock batch inference
This sample is a baseline for anyone looking to enhance their security posture by analyzing CloudTrail logs. Reusable for other usecases too.
Vivien de Saint pern
Amazon Employee
Published Sep 9, 2024
In the ever-evolving landscape of cloud computing, securing and auditing cloud resources is paramount. AWS CloudTrail provides a comprehensive record of actions taken within your AWS environment, making it an invaluable tool for monitoring and securing cloud operations. However, the vast amount of data generated by CloudTrail can be overwhelming, making it difficult to identify potential security threats in real-time.
Imagine a small but rapidly growing startup that has recently launched a cloud-based service. With limited resources, the team is focused on maintaining the security and stability of their AWS infrastructure. However, as the company scales, the volume of CloudTrail logs starts to overwhelm their ability to monitor and analyze them effectively. The team knows that staying on top of security threats is crucial, but they’re already stretched thin, lacking the time and budget to implement large-scale, complex solutions. The startup considers several options. They could deploy open-source tools like ELK (Elasticsearch, Logstash, Kibana) for log analysis, but setting up and managing such a system requires significant technical expertise and ongoing maintenance—something their small team simply can’t afford. Another option might be subscribing to a managed security service, but the high costs are prohibitive, especially for a growing company with a tight budget. Custom scripts to parse and analyze logs are also considered, but it quickly becomes clear that this approach is too basic and won’t scale as the company continues to grow.
This is where Amazon Bedrock emerges as the potential solution. By leveraging Amazon Bedrock batch inference, the company can apply advanced AI models to analyze their CloudTrail logs without needing to invest in complex infrastructure or hire specialized personnel. This article and associated notebook are here to showcase how to leverage Amazon Bedrock for batch inference to analyze CloudTrail logs through the following steps: retrieve cloudtrail events, create a file for batch inference, launch the batch job, summarize the output and send the final summary to relevant users.
"CloudTrail Analysis with Amazon Bedrock Batch Inference" is structured to guide users through the process of setting up and executing batch inference on CloudTrail logs. Below are the key sections:
- Prerequisites:
- Before diving into the analysis, ensure that you have the necessary permissions and access to the required resources, such as AWS credentials, access to the Claude 3 Haiku model in the
us-east-1
region, and permissions to create the needed AWS resources: 1 S3 bucket, 1 IAM role, and 1 SNS topic.
- Setup and Configuration:
- The notebook starts by installing
boto3
, and setting up the necessary AWS resources. This includes creating an S3 bucket to store intermediate data, defining an IAM role that grants Bedrock access to these resources and creating the SNS topic to send the final analysis to relevant users.
- Processing CloudTrail Events:
- Using the
lookup_events
API, the notebook demonstrates how to retrieve CloudTrail events, format them appropriately, and prepare them for batch inference. The results of the inference are then processed and analyzed to detect any anomalies or security concerns.
- Batch Inference with Amazon Bedrock:
- The core of the sample is centered around setting up a batch inference job using the Claude 3 Haiku model. This includes code snippets to create the batch inference jsonl input file, where each line represents a JSON object containing the
modelInput
. (see following section for more details)
- Output and Analysis:
- Finally, results from the batch inference job are interpreted and summarized to provide insights into potential security threats detected in the CloudTrail logs.
In order to create a batch inference job, a
jsonl
file needs to be created. Each line contains the prompt for the needed completion. Here is an example :Once all the lines are added to the
jsonl
file, it can be uploaded to an S3 bucket and batch job can be created with the create_model_invocation_job
function on the bedrock client:Then the job starts.
JobArn
can be retrieved to check the progression of the batch job during processing:In order to check the status of the job, you can query the
get_model_invocation_job
function:Batch inference job output data configuration
outputDataConfig
points to a folder in the S3 bucket where sub-folder job-id
is created containing a .out
file eg. input_file_name.jsonl.out
with the completion results.Each item processed by the batch inference job is a json object in the
.out
file containing modelInput
and modelOutput
. Completion can be retrieved by iterating over each json object and looking into the content:NOTE: a
manifest.json.out
is also generated and includes statistics on the batch job. eg. input tokens, output tokens.Finally, this sample generates a summary of all the batch items generated by Amazon Bedrock batch inference by leveraging direct API to bedrock
invoke_model
.This approach to analyzing CloudTrail logs can help identify security anomalies that might be missed by conventional rule-based systems. By leveraging AI, you can uncover patterns and potential threats that require deeper analysis. Also, by using the Claude 3 Haiku model, this sample strikes a balance between cost and quality, allowing users to perform sophisticated analysis without incurring prohibitive costs.
Est. Pricing for 20k Cloudtrail logs (~500 tokens) with Claude 3 Haiku on Amazon Bedrock in us-east-1 at time of publication:
Input tokens: 20k * 500 tokens, 2K * 500 token for final summary, prompt tokens : 12M tokens / $3
Output tokens: 2k * 500 tokens for batch inference, 20 request for final summary of 500 tokens: ~1M tokens / $1.25
Next steps could be:
- filter out unwanted services or users to limit the scope of analysis and cost
- break down the cloudtrail events by services or users to build smaller granularity summaries
- implement period based activities to allow analysis of recent or historical events
While this example focuses on CloudTrail logs, the principles and techniques demonstrated can be extended to other types of logs and data sources within AWS.
Here are a 2 potential use cases:
- Application Logs: For applications running on AWS, logs can be ingested into Bedrock for batch processing, allowing you to analyze performance, detect errors, and even predict future issues based on historical data.
- Operational Data: Beyond security, Bedrock batch inference can be used to analyze operational data, such as EC2 instance metrics or S3 access logs, to optimize resource usage and reduce costs.
This sample is a baseline for anyone looking to enhance their security posture by analyzing CloudTrail logs using Amazon Bedrock. By following the steps outlined in the notebook, you can quickly set up and execute batch inference jobs that help you identify potential security threats in your AWS environment. Furthermore, the techniques demonstrated here can be applied to a wide range of other AWS services and use cases, making it a versatile addition to your cloud security toolkit.
We encourage you to explore its capabilities, adapt it to your specific needs, and share your experiences with the community.
We encourage you to explore its capabilities, adapt it to your specific needs, and share your experiences with the community.
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.