
Migrating Datasets Between S3 Buckets Across AWS Accounts
Explore efficient strategies for migrating data between S3 buckets across AWS accounts
Dwaragha
Amazon Employee
Published Feb 26, 2025
Organizations often find themselves needing to transfer large datasets between Amazon S3 buckets across different AWS accounts. This article explores two powerful options for accomplishing this task: AWS DataSync and S3 Cross-Account Replication. We'll dive deep into the setup process, pros, cons, and important considerations for each method for this specific use case of transferring large datasets between S3 buckets across different AWS Accounts, helping you make an informed decision for your data migration needs.
AWS DataSync is a managed data transfer service that simplifies, automates, and accelerates moving data between on-premises storage systems and AWS storage services, or between AWS storage services.
- Create a DataSync agent (if needed)
- Note: For S3 to S3 transfers, you can skip this step as AWS-managed agents are available.
- Create source and destination locations
- Navigate to the DataSync console and click "Create location"
- Choose "Amazon S3" as the location type
- Select the source S3 bucket and create an IAM role with necessary permissions
- Repeat the process for the destination bucket
- Configure a DataSync task
- In the DataSync console, click "Create task"
- Choose the source and destination locations you created
- Configure task settings (e.g., verification mode, overwrite options, logging)
- Run the task
- Select the created task and click "Start"
- Create source and destination locations:aws datasync create-location-s3 \
--s3-bucket-arn arn:aws:s3:::source-bucket \
--s3-config '{"BucketAccessRoleArn":"arn:aws:iam::account-id:role/role-name"}' \
--region us-east-1 - Create a DataSync task:aws datasync create-task \
--source-location-arn "arn:aws:datasync:us-east-1:account-id:location/source-location-id" \
--destination-location-arn "arn:aws:datasync:us-east-1:account-id:location/destination-location-id" \
--name "my-migration-task" \
--options '{"VerifyMode":"ONLY_FILES_TRANSFERRED","OverwriteMode":"ALWAYS"}' \
--region us-east-1 - Start the task:aws datasync start-task-execution \
--task-arn "arn:aws:datasync:us-east-1:account-id:task/task-id" \
--region us-east-1
- Optimized for one-time large-scale transfers
- Efficiently copies existing objects without additional configuration
- Designed to fully utilize network bandwidth
- Simplified setup compared to cross-account replication
- Built-in data integrity checks during and after transfer
- Supports transferring data between various AWS storage services
- Not ideal for continuous replication scenarios
- May not preserve exact object metadata like creation time
- Additional cost for using the DataSync service
- Ensure proper IAM permissions are set up in both source and destination accounts
- Monitor DataSync metrics in CloudWatch for transfer progress and potential issues
- Consider using S3 Batch Operations for post-migration verification of large datasets
S3 Cross-Account Replication allows you to automatically copy objects from a source bucket to a destination bucket in a different AWS account.
- Enable versioning on both source and destination buckets
- In S3 console, select bucket > Properties > Bucket Versioning > Edit > Enable
- Create an IAM role in the source account for replication
- In IAM console, create role > Choose S3 service > Attach necessary permissions
- Update the destination bucket policy
- In S3 console, select destination bucket > Permissions > Bucket policy > Edit
- Configure replication
- In source bucket > Management > Replication rules > Create replication rule
- Follow the wizard to set up cross-account replication
- Enable versioning:
aws s3api put-bucket-versioning --bucket my-bucket --versioning-configuration Status=Enabled
- Create IAM role (use AWS CLI or console)
- Update destination bucket policy:
aws s3api put-bucket-policy --bucket destination-bucket --policy file://bucket-policy.json
- Create replication configuration:
aws s3api put-bucket-replication --bucket source-bucket --replication-configuration file://replication.json
Sample replication.json:
- Ideal for continuous replication scenarios
- Preserves object metadata, including creation time
- Automatically replicates new objects after setup
- Integrated with existing S3 infrastructure
- More complex setup compared to DataSync
- May require additional configuration for existing objects
- Limited to S3-to-S3 transfers only
- Replication only applies to new objects by default; use S3 Batch Replication for existing objects
- Ensure proper IAM roles and bucket policies are configured in both accounts
- Be aware of potential issues with large objects (e.g., 20GB+ files) during multipart uploads
- Monitor replication metrics like ReplicationLatency and BytesPendingReplication
- Object ownership - Review the below section on object ownership aspects
By default, when replicating objects across S3 buckets in different AWS accounts, the owner of the source object also owns the replica. However, you can change the ownership to the target account using the following methods:
- Owner Override Option:
- Add the owner override option to the replication configuration.
- Grant Amazon S3 the
s3:ObjectOwnerOverrideToBucketOwner
permission in the IAM role used for replication. - Add the
s3:ObjectOwnerOverrideToBucketOwner
permission in the destination bucket policy.
- Bucket Owner Enforced Setting:
- Use the bucket owner enforced setting for Object Ownership in the destination bucket.
- This setting automatically changes replica ownership to the AWS account that owns the destination bucket.
- It disables object ACLs and doesn't require the
s3:ObjectOwnerOverrideToBucketOwner
permission.
- Update Replication Configuration:
- In the replication rule, select the checkbox to change object ownership when setting up replication.
When using these methods, consider the following:
- Only use the owner override option when source and destination buckets are owned by different AWS accounts.
- The bucket owner enforced setting is simpler to implement and mimics the owner override behavior.
- Changing ownership affects ACL replication and subsequent ACL changes on source objects.
- Use CloudWatch metrics for both DataSync and S3 Replication
- Key metrics: BytesTransferred, ObjectsTransferred, ReplicationLatency
- Set up CloudWatch alarms for abnormal values or failures
- For DataSync, monitor task progress in the DataSync console
- For S3 Replication, use S3 Batch Operations to track progress of existing object replication
- Enable S3 Inventory for both source and destination buckets
- Compare inventory reports to verify object counts and sizes
- Use AWS CLI to perform sample checks on object metadata and contentaws s3 ls s3://source-bucket --recursive --summarize
aws s3 ls s3://destination-bucket --recursive --summarize - Implement custom scripts using AWS SDK to perform detailed reconciliation
- Use S3 Batch Operations to run a final verification job comparing checksums
AWS DataSync | S3 Cross-Acount Replication | |
---|---|---|
Ideal Use Case | One-time large-scale transfers | Continuous replication |
Setup Complexity | Simpler | Complex |
Existing Object Transfer | Build-in | Requires additional configuration |
Metadata Preservation | Limited | Full preservation |
Transfer Speed | Optimized for high bandwidth | Dependent on S3 performance |
Supported Services | Various AWS storage services | S3 to S3 only |
Cost | Additional service cost | Uses existing S3 infrastructure |
Monitoring | DataSync console and CloudWatch | CloudWatch metrics |
Data Integrity Checks | Built-in verification | Requires additional setup |
Both AWS DataSync and S3 Cross-Account Replication offer robust solutions for migrating datasets between S3 buckets across AWS accounts. Choose DataSync for one-time large transfers with simplified setup, or opt for S3 Cross-Account Replication when continuous replication and full metadata preservation are crucial. Always consider your specific use case, data volume, and long-term replication needs when making your decision. By carefully evaluating these options and following the detailed setup instructions, you can ensure a smooth and efficient data migration process across your AWS accounts.
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.