AWS Logo
Menu
Perform Unsupported S3 Storage Class Transitions with Batch Operations

Perform Unsupported S3 Storage Class Transitions with Batch Operations

This post will detail a solution using S3 Batch Operations to transition your objects across S3 storage classes. This approach works best for performing storage class transitions that are unsupported by S3 lifecycle Rules, such as S3 Standard Infrequent Access to S3 Standard, or One Zone-Infrequent Access (One Zone-IA) to S3 Intelligent-Tiering. By the end, you'll have the knowledge to confidently migrate your S3 data to the optimal storage class for your needs.

Ankit Patel
Amazon Employee
Published Sep 27, 2024
Last Modified Oct 18, 2024
If you believe you or your organization could benefit from switching the storage class of your Amazon S3 data (for example: From S3 One Zone-IA to S3 Intelligent-Tiering), then this blog is for you. When setting up your workload, you may have initially believed that S3 One Zone-IA was the best choice for your storage class. But over time, your objects’ access patterns may have changed–to a point where it may be more economical to use a different storage class for your existing objects.
In this blog, I will use the example of transitioning objects from S3 One Zone-IA to S3 Intelligent-Tiering. If your use case requires you to perform a different transition that is unsupported by S3 Lifecycle Rules (eg: S3 Intelligent-Tiering to S3 Standard or S3 Glacier to S3 Standard-IA), then skip to the section titled “Solution Context”.

S3 Intelligent-Tiering Basics:

In S3 Intelligent-Tiering, an object is placed in a tier based on the amount of time elapsed since it was last accessed. For the first 30 days after an object is uploaded OR accessed, the object is placed in the Frequent Access Tier. If the object is not accessed within those 30 days, it is then moved to the Infrequent Access Tier. If an object in the Infrequent Access Tier has not been accessed for 60 days (ie: not accessed at all for 90 consecutive days), then it will be moved to the Archive Instant Access Tier. If an object in a non-Frequent Access Tier is accessed, then it will be moved back up to the Frequent Access Tier. If an object in the Frequent Access Tier is accessed, then its 30-day timer will reset. Keep this information in mind, we’ll reference it throughout this post.

Benefits of S3 Intelligent-Tiering:

We would like to highlight two types of benefits customers can hope to achieve from their S3 data from One Zone-IA to INT.
The first benefit is that you will obtain higher availability in the S3 Intelligent-Tiering class. S3 One Zone-IA has availability of 99.5%, whereas S3 Intelligent-Tiering has an availability of 99.9%. Thus, a transition from S3 One Zone-IA to S3 Intelligent-Tiering will reduce the downtime from 0.5% to 0.1%, an 80% reduction!
The second benefit is cost-related. Whether the S3 Intelligent-Tiering class is less costly than One Zone-IA depends on your use case. There are situations where keeping your objects in the S3 One Zone-IA class may be more cost-effective, compared to keeping them in S3 Intelligent-Tiering and vice-versa. For example, if a large portion of your data is rarely-accessed, then S3 Intelligent-Tiering is likely the lower-cost option. Inversely, for more frequently-accessed data, S3 One Zone-IA may be less costly *compared to S3 Intelligent-Tiering (*though depending on the access frequency of your data, other S3 storage classes such as S3 Standard may be more cost-effective). Another benefit comes from S3 Intelligent-Tiering Archive tiers, which are significantly less expensive than storage in S3 One Zone-IA, making it suitable for rarely-accessed objects.
Let’s revisit the storage tiers within S3 Intelligent-Tiering that I described earlier. For example, let’s look at pricing in the us-east-1 region. Frequent Access Tier costs $0.023 per GB. Infrequent Access Tier costs $0.0125 per GB. Finally, Archive Instant Access Tier costs $0.004 per GB. On the other hand, S3 One Zone-IA costs $0.01 per GB. As you can see, for storage, the first two tiers of S3 Intelligent-Tiering have a higher storage cost rate than S3 One Zone-IA. However, the Archive Instant Access Tier offers a 60% less expensive storage cost rate than S3 One Zone-IA! Thus, you can likely obtain considerable cost savings by moving to S3 Intelligent-Tiering even if not all of your data is rarely accessed.
You might be wondering what portion of data would need to be seldom assessed exactly constitutes a “small” portion of your data. From using the AWS Pricing Calculator (see image below), we can determine that** S3 Intelligent-Tiering becomes less costly than S3 One Zone-IA if about 69% of your data is in the Archive Instant Access Tier! The remaining 31% of the data can afford to be in either the Frequent or Infrequent Access Tiers. The lower the percentage of your data is in the Frequent and Infrequent Access Tier, the more favorable the pricing will be. In other words, a higher portion of data in an archive tier (ie: Archive Instant Access, Archive Access, Deep Archive Access) is more cost-efficient.
Here is a screenshot of a simple example. For reference, the first row provides the S3 costs if the data was stored in the Standard Class. The 2nd row provides the cost estimate if the same usage was in the S3 One Zone-IA Class. Finally, the 3rd row provides the cost for the same data usage if it was in the S3 Intelligent-Tiering class. Since this is a conservative estimate, I combined the Infrequent Access Tier usage into the Frequent Access Tier. Additional assumptions we made here are as follows: We have 1 TB of data, with the average object size being 25 MB. Of course, your expected costs will vary depending on your workload; the point of the example above was to demonstrate how a transition from S3 One Zone-IA to S3 Intelligent-Tiering could be a more cost-optimal option. Please run the numbers according to your setup to determine if migrating S3 storage classes is right for you. Refer to the AWS Pricing Calculator to create your own estimate!

Solution Context:

Typically, if a customer wants to change the storage class of their existing objects in bulk, then they would usually use S3 Lifecycle Rules. You can configure Lifecycle Rules to automatically change the storage class of your objects based on a customizable timeline you specify–a process highlighted in AWS Documentation. However, to complicate matters, transitioning objects from “upwards” (for example, going from S3 One Zone-IA storage class to S3 Intelligent-Tiering) is not supported by S3 Lifecycle Rules.
The diagram shows that S3 Lifecycle Rules support transitions to “downwards” classes, eg: S3 Intelligent-Tiering to S3 One Zone-IA or S3 Standard to S3 Glacier Instant Retrieval. “Upwards” transitions such as S3 One Zone-IA to S3 Intelligent-Tiering are not supported.
Due to this limitation, we must come up with a custom solution if we want to transition our objects from S3 One Zone-IA to S3 Intelligent-Tiering. The remainder of the article will highlight a custom solution for performing the transition from S3 One Zone-IA to S3 Intelligent-Tiering.

Solution Architecture:

Architecture Diagram

Solution Overview:

  • Generate S3 Inventory Report
    • (optional) advanced use cases: generate manifest files
  • Set up S3 Batch Operations
  • Validate results

Solution Step-by-step guide:

Before you get started with transitioning your S3 objects from One Zone-IA to S3 Intelligent-Tiering, note that all the objects you want to transition must be in the same S3 bucket. If the S3 One Zone-IA objects you want to transition are stored across multiple S3 buckets, then the following solution will need to be repeated, once for each bucket containing S3 One Zone-IA objects.
Step 1: Generate S3 Inventory Report
First, generate an Amazon S3 Inventory Report. An Inventory Report simply lists all of the objects in an S3 bucket and is used as an input to our S3 Batch Operations job.
To get an S3 inventory report for your bucket, first go to your S3 bucket on the AWS Management Console.
Click your bucket name and go to its “Management” tab.
Scroll down on the page until you see “Inventory Configurations”.
Click “Create inventory configuration” and fill out the form as per your set up. Below is a template for some sample configurations. Finally, click “Create”.
Note for advanced use cases: if you have a use case where you want to apply the One Zone-IA to S3 Intelligent-Tiering transition only to a subset of objects based on their characteristics, then you would need to select the relevant metadata fields when setting up your Inventory Report. For example, if you want to transition objects that are older than a certain date, then be sure to select the “last_modified” field. After the inventory report is generated, you will have to filter rows based on the “last_modified” field and keep only the ones you need.
Once you create the inventory report configuration, your screen should look like this:
 
Note that the first inventory report may take up to 48 hours to be delivered to your destination bucket.
You can check the status of your inventory report by going to your destination bucket. If you have a file named “manifest.json” at the path “s3://{DESTINATION_BUCKET}/{SOURCE_BUCKET_NAME}/{INVENTORY_CONFIGURATION_NAME}/20XX-XX-XXTXX-XXZ/manifest.json”, then your report is ready.
Step 1.2: Preview the manifest file and inventory report
Select and download manifest.json. Open it in a JSON viewer. It will look something like this:
In the “files” section, you will have a “key” containing the s3 path to the actual inventory. Go to that path in the S3 console and download the .csv.gz file. Unzip it and open it in a csv viewer. Below is an example. You can see how all of the items that you specified in your inventory configuration are listed in column B. Here, column C contains the last_modified metadata field, and column D contains the storage_class metadata field.
Note for advanced use cases: Take this csv file, and filter out objects for which you do not want to change the storage class. You can perform this filtering directly within the csv file, or you can import it to Amazon Athena and use SQL queries to remove the unneeded rows based column attributes. Once you have removed the unneeded rows, save the file as a csv, and upload it to an Amazon S3 bucket. You will have to use this csv file when setting up your Batch Operations job.

Step 2: Set up S3 Batch Operations
Next, let’s set up our S3 Batch Operations job. An S3 Batch Operations job is the basic unit of work for the S3 Batch Operations feature in Amazon S3. A job contains all the information necessary to run a specified operation on a list of objects. To create an S3 Batch Operations job, you can provide the manifest file (from our inventory report configuration) and specify the operation you want to perform on your objects.
From the Amazon S3 Console page, go to your source bucket’s inventory configuration. (ie: click on your bucket name, go to the Management tab, and scroll to the bottom of the page). Select your inventory and click “Create job from manifest”.
The job details should be pre-populated to use your S3 Inventory Report and it should look similar to the image below. Once you’ve verified it, click “Next”.
Note For Advanced Use cases: After you have applied your custom filtering to remove unnecessary rows from your csv report (as described at the end of step 1.3), here, instead of using the S3 Inventory report, you will use the “CSV” option and for “Manifest format”, enter the path to your CSV file which contains the list of objects you want to apply the ‘change storage class’ operation to.
Now, you will have to describe the operation you want to perform on your objects. Use the following configurations:
Note: You can have your destination bucket be the same as your source bucket. If this is the case, then the Batch Operation job will perform an in-place copy of objects. If versioning is disabled on your bucket, then the S3 One Zone-IA objects will be overwritten. If versioning is enabled, then the S3 One Zone-IA objects will become non-current versions, with the S3 Intelligent Tiering copies becoming the current/latest version.
Note: If your objects are greater than 5GB, then you will have to select the “Invoke with AWS Lambda function” Operation type. Refer to this AWS Blog post for additional details.
Note: If your objects are in an S3 Glacier storage class, then you may need to create two Batch Operation jobs. The first Batch Operation job will be a "restore" operation to S3 Standard. For details on how to restore objects through an S3 Batch Operation, visit AWS Documentation. The second job will be the "copy" operation to your desired storage class. However, this second Batch Operation job may be unnecessary, because if your objects have been restored to S3 Standard (eg: via the first Batch Operation Job), then you can use simply use S3 Lifecycle Rules to perform the storage class change (to any S3 class). For more information on how to use S3 Lifecycle Rules to transition S3 storage classes, please refer to AWS Documentation.
In the “Configure additional options” page, provide a destination bucket for “Completion report destination”. You can use the name of the same destination bucket.
For the permissions section, expand, the “View IAM role policy template and IAM trust policy” dropdown. You will be presented with an IAM Role policy and an IAM trust policy. You must create an IAM role that uses those policies. This is required to allow the S3 Batch Operations service to perform the batch job. In a new tab, open the console page to AWS IAM.
From the IAM console page, go to “Policies” and click “Create policy”.
Switch the Policy Editor view to “JSON”.
Copy the “IAM role policy template” from the S3 Batch Operations setup window and paste it into the JSON editor in the IAM window.
In line 23 of the JSON editor, replace the value of {{SourceBucket}} with the name of the S3 bucket which contains your objects.
Click “next”, provide a name for your policy (eg: S3BatchOperationsPolicy), and click “Create policy”.
You will receive a success notification like this:
Next, go to “Roles” and “Create role”.
For trust entity settings, enter in the following:
Click “next”. In the policy search bar, enter the name of the policy that you just created. Select it. Click Next.
Provide the role a name and click “Create role”.
You will see a confirmation which looks like this:
Now, go back to the window of the S3 Batch Operations job setup page. Click the circular refresh button next to “IAM role”, then select the IAM role that you just created. Then, click “next”.
Afterwards, review the job configuration, scroll all the way down, then click “Create job”. You will see a green success notification as below":
Now, select the job and click “Run job”.
Now, wait for the batch operation job to finish. The amount of time it takes depends on the size of the job’s manifest file and the job’s priority. Once the job is completed, its “Status” will say “Completed”.
Congratulations! You have successfully run an S3 Batch Operations job which created a copy of your objects in the S3 Intelligent-Tiering storage class!

Step 3: Validate results
Finally, let’s validate the job. For validation, we will make sure all of the objects were copied (completeness) and that all of those new copies have the S3 Intelligent-Tiering storage class (correctness).
Let’s highlight a few techniques to validate the correctness and completeness of your results.
Method 1: You can check the report of your S3 Batch Operation job. To do this, head over to the S3 bucket you specified in the “Completion report destination” configuration for your Batch Operation job (as described in a previous step). The bucket will contain a folder whose name is the S3 Batch Operation job id (see image below).
Click on that folder name. Then click on the “results/” folder. Within the results folder, there will be a single csv file. Select it and download it.
Open the csv. It should look something like the image below, though your source bucket name (column A) and object names (column B) will be different. The number of rows in this file should be equal to the number of objects you intended to move to S3 Intelligent-Tiering, which should also be equal to the number of rows in the S3 Inventory Report csv that you viewed in step 1.3. All cells in Column D should be “succeeded”, all cells in Column E should have a status code of “200”, and all cells in Column G should equal “Successful”. If there are rows that don't have all of these values, then the corresponding object was not copied over correctly. If this is the case then, address the error, and try the Batch Operation job again. Though, if the error is only with a few objects, it may be quicker to just manually copy them over instead of re-running your Batch Operation job.
Method 2: You can use the aws cli (Command Line Interface) to verify the expected behavior of your Batch Operation job. Specifically, we can use the aws s3api list-objects-v2 command to retrieve a list of objects from our bucket, filter the ones that are of the S3 Intelligent-Tiering class, and format & write the output to a csv file. I’ve provided the command template below for convenience. Just replace [YOUR_DESTINATION_BUCKET_NAME] with the name of your destination bucket and run the command in a bash shell.
aws s3api list-objects-v2 --bucket [YOUR_DESTINATION_BUCKET_NAME] --query "Contents[?StorageClass=='INTELLIGENT_TIERING'].[Key, LastModified, Size, StorageClass]" --output text | awk '{print $1","$2","$3","$4}' > intelligent-tiering-objects.csv
The output of the command was directed to a file named ‘intelligent-tiering-objects.csv’, so you won't see anything in your terminal window. Instead, open that csv file. It will look similar to the image below.
This csv file lists all of the S3 Intelligent-Tiering objects in your S3 bucket. Confirm that the number of rows in the CSV equals the number of objects in your inventory report csv file.
Note: If your destination bucket had pre-existing intelligent objects prior to running the Batch Operation job, then I recommend applying a filter in the spreadsheet to remove the rows for objects last_modified (column B) before the date and time at which you ran the Batch Operation job.
Note: The list-objects-v2 command returns up to 1000 objects. If you have more than 1000 objects in your batch operation job, then you will need to use the --starting token parameter in order to fetch all of your objects. Refer to the official list-objects-v2 documentation for sample code on how you can programmatically script this.
You have now successfully verified your Batch Operation job’s output!
Post-transition considerations:
  • If you chose to copy your objects into a different S3 bucket (ie: if your source bucket and destination bucket are different), then after the verification step, you will have to swap out the S3 endpoints in your downstream applications with the endpoints of the destination bucket. Once you have updated your endpoints and have tested your applications, you can delete the S3 One Zone-IA objects from your source destination bucket. You can delete your S3 One Zone-IA objects quite effortlessly via S3 Life Cycle Policies.
  • After you perform the storage class change, to determine your cost savings, you will have to wait until your objects have settled in the appropriate INT storage tier. This should take about 90 days–the amount of time it takes for unaccessed objects to be transitioned from the Frequent Access Tier to the Archive Instant Access Tier.

Conclusion:

This blog post has explored the potential instances where you may want to migrate your Amazon S3 storage from One Zone-Infrequent Access to S3 Intelligent-Tiering. We highlighted how S3 Intelligent-Tiering offers higher availability and can potentially lead to cost savings, especially if a significant portion of your data is rarely accessed.
To determine if an S3 Intelligent-Tiering migration is right for your use case, we encourage you to perform your own pricing estimate using the AWS Pricing Calculator. Once you input your specific workload details, you can evaluate whether the cost savings from S3 Intelligent-Tiering Archive Instant Access Tier would outweigh any increased costs from the Frequent or Infrequent Access Tiers. If the numbers work out in your favor, this blog post has provided a step-by-step guide to manually migrate your S3 One Zone-IA objects to S3 Intelligent-Tiering using S3 Batch Operations. Be sure to thoroughly validate the results to ensure a successful transition.

Resources/Further reading:


About the Author:

Ankit Patel is a Solutions Architect at AWS based in the NYC area. As part of the Prototyping and Customer Engineering (PACE) team, he helps customers bring their innovative ideas to life by rapid prototyping, using the AWS platform to build, orchestrate, and manage custom applications. Feel free to follow Ankit on LinkedIn.
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments