AWS Logo
Menu
Migrating Datasets Between S3 Buckets Across AWS Accounts

Migrating Datasets Between S3 Buckets Across AWS Accounts

Explore efficient strategies for migrating data between S3 buckets across AWS accounts

Dwaragha
Amazon Employee
Published Feb 26, 2025
Organizations often find themselves needing to transfer large datasets between Amazon S3 buckets across different AWS accounts. This article explores two powerful options for accomplishing this task: AWS DataSync and S3 Cross-Account Replication. We'll dive deep into the setup process, pros, cons, and important considerations for each method for this specific use case of transferring large datasets between S3 buckets across different AWS Accounts, helping you make an informed decision for your data migration needs.

Option 1: AWS DataSync

AWS DataSync is a managed data transfer service that simplifies, automates, and accelerates moving data between on-premises storage systems and AWS storage services, or between AWS storage services.

Setting Up AWS DataSync

Console Setup:

  1. Create a DataSync agent (if needed)
    • Note: For S3 to S3 transfers, you can skip this step as AWS-managed agents are available.
  2. Create source and destination locations
    • Navigate to the DataSync console and click "Create location"
    • Choose "Amazon S3" as the location type
    • Select the source S3 bucket and create an IAM role with necessary permissions
    • Repeat the process for the destination bucket
  3. Configure a DataSync task
    • In the DataSync console, click "Create task"
    • Choose the source and destination locations you created
    • Configure task settings (e.g., verification mode, overwrite options, logging)
  4. Run the task
    • Select the created task and click "Start"

AWS CLI Setup:

  1. Create source and destination locations:aws datasync create-location-s3 \
    --s3-bucket-arn arn:aws:s3:::source-bucket \
    --s3-config '{"BucketAccessRoleArn":"arn:aws:iam::account-id:role/role-name"}' \
    --region us-east-1
  2. Create a DataSync task:aws datasync create-task \
    --source-location-arn "arn:aws:datasync:us-east-1:account-id:location/source-location-id" \
    --destination-location-arn "arn:aws:datasync:us-east-1:account-id:location/destination-location-id" \
    --name "my-migration-task" \
    --options '{"VerifyMode":"ONLY_FILES_TRANSFERRED","OverwriteMode":"ALWAYS"}' \
    --region us-east-1
  3. Start the task:aws datasync start-task-execution \
    --task-arn "arn:aws:datasync:us-east-1:account-id:task/task-id" \
    --region us-east-1

Pros of AWS DataSync

  1. Optimized for one-time large-scale transfers
  2. Efficiently copies existing objects without additional configuration
  3. Designed to fully utilize network bandwidth
  4. Simplified setup compared to cross-account replication
  5. Built-in data integrity checks during and after transfer
  6. Supports transferring data between various AWS storage services

Cons of AWS DataSync

  1. Not ideal for continuous replication scenarios
  2. May not preserve exact object metadata like creation time
  3. Additional cost for using the DataSync service

Key Considerations

  1. Ensure proper IAM permissions are set up in both source and destination accounts
  2. Monitor DataSync metrics in CloudWatch for transfer progress and potential issues
  3. Consider using S3 Batch Operations for post-migration verification of large datasets

Option 2: S3 Cross-Account Replication

S3 Cross-Account Replication allows you to automatically copy objects from a source bucket to a destination bucket in a different AWS account.

Setting Up S3 Cross-Account Replication

Console Setup:

  1. Enable versioning on both source and destination buckets
    • In S3 console, select bucket > Properties > Bucket Versioning > Edit > Enable
  2. Create an IAM role in the source account for replication
    • In IAM console, create role > Choose S3 service > Attach necessary permissions
  3. Update the destination bucket policy
    • In S3 console, select destination bucket > Permissions > Bucket policy > Edit
  4. Configure replication
    • In source bucket > Management > Replication rules > Create replication rule
    • Follow the wizard to set up cross-account replication

AWS CLI Setup:

  1. Enable versioning:
    aws s3api put-bucket-versioning --bucket my-bucket --versioning-configuration Status=Enabled
  2. Create IAM role (use AWS CLI or console)
  3. Update destination bucket policy:
    aws s3api put-bucket-policy --bucket destination-bucket --policy file://bucket-policy.json
  4. Create replication configuration:
    aws s3api put-bucket-replication --bucket source-bucket --replication-configuration file://replication.json
Sample replication.json:

Pros of S3 Cross-Account Replication

  1. Ideal for continuous replication scenarios
  2. Preserves object metadata, including creation time
  3. Automatically replicates new objects after setup
  4. Integrated with existing S3 infrastructure

Cons of S3 Cross-Account Replication

  1. More complex setup compared to DataSync
  2. May require additional configuration for existing objects
  3. Limited to S3-to-S3 transfers only

Key Considerations

  1. Replication only applies to new objects by default; use S3 Batch Replication for existing objects
  2. Ensure proper IAM roles and bucket policies are configured in both accounts
  3. Be aware of potential issues with large objects (e.g., 20GB+ files) during multipart uploads
  4. Monitor replication metrics like ReplicationLatency and BytesPendingReplication
  5. Object ownership - Review the below section on object ownership aspects

Object ownership

By default, when replicating objects across S3 buckets in different AWS accounts, the owner of the source object also owns the replica. However, you can change the ownership to the target account using the following methods:
  1. Owner Override Option:
    • Add the owner override option to the replication configuration.
    • Grant Amazon S3 the s3:ObjectOwnerOverrideToBucketOwner permission in the IAM role used for replication.
    • Add the s3:ObjectOwnerOverrideToBucketOwner permission in the destination bucket policy.
  2. Bucket Owner Enforced Setting:
    • Use the bucket owner enforced setting for Object Ownership in the destination bucket.
    • This setting automatically changes replica ownership to the AWS account that owns the destination bucket.
    • It disables object ACLs and doesn't require the s3:ObjectOwnerOverrideToBucketOwner permission.
  3. Update Replication Configuration:
    • In the replication rule, select the checkbox to change object ownership when setting up replication.
When using these methods, consider the following:
  • Only use the owner override option when source and destination buckets are owned by different AWS accounts.
  • The bucket owner enforced setting is simpler to implement and mimics the owner override behavior.
  • Changing ownership affects ACL replication and subsequent ACL changes on source objects.

Measuring Successful Migration

Real-time Monitoring

  1. Use CloudWatch metrics for both DataSync and S3 Replication
    • Key metrics: BytesTransferred, ObjectsTransferred, ReplicationLatency
  2. Set up CloudWatch alarms for abnormal values or failures
  3. For DataSync, monitor task progress in the DataSync console
  4. For S3 Replication, use S3 Batch Operations to track progress of existing object replication

Post-Migration Validation

  1. Enable S3 Inventory for both source and destination buckets
    • Compare inventory reports to verify object counts and sizes
  2. Use AWS CLI to perform sample checks on object metadata and contentaws s3 ls s3://source-bucket --recursive --summarize
    aws s3 ls s3://destination-bucket --recursive --summarize
  3. Implement custom scripts using AWS SDK to perform detailed reconciliation
  4. Use S3 Batch Operations to run a final verification job comparing checksums

Summary Comparison Table

AWS DataSyncS3 Cross-Acount Replication
Ideal Use CaseOne-time large-scale transfersContinuous replication
Setup ComplexitySimplerComplex
Existing Object TransferBuild-inRequires additional configuration
Metadata PreservationLimitedFull preservation
Transfer SpeedOptimized for high bandwidthDependent on S3 performance
Supported ServicesVarious AWS storage servicesS3 to S3 only
CostAdditional service costUses existing S3 infrastructure
MonitoringDataSync console and CloudWatchCloudWatch metrics
Data Integrity ChecksBuilt-in verificationRequires additional setup

Conclusion

Both AWS DataSync and S3 Cross-Account Replication offer robust solutions for migrating datasets between S3 buckets across AWS accounts. Choose DataSync for one-time large transfers with simplified setup, or opt for S3 Cross-Account Replication when continuous replication and full metadata preservation are crucial. Always consider your specific use case, data volume, and long-term replication needs when making your decision. By carefully evaluating these options and following the detailed setup instructions, you can ensure a smooth and efficient data migration process across your AWS accounts.
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments