AWS Logo
Menu
Migrating Media Assets Workflow with AWS DataSync

Migrating Media Assets Workflow with AWS DataSync

One of our customers has reached out with a request to migrate media assets stored on their old platform’s on-premise NAS storage to a new platform hosted on AWS cloud infrastructure. They are now using AWS S3 as the storage solution for their new platform. In this article, we will outline a solution using the AWS DataSync service [1] to transfer the data from the on-premise NAS storage to an AWS S3 bucket. The process involves creating a shared media space on the NAS, mounting it to the server using the SM

Published Jan 6, 2025

Audience profile:

This article is intended for administrators planning to migrate data to AWS. It assumes familiarity with NAS storage, including how to mount and manage it, as well as proficiency in Linux and CLI commands. Additionally, knowledge of IAM (Identity and Access Management), AWS S3, and AWS CloudWatch for monitoring migration task logs is required to fully understand the migration process.

Introduction:

One of our customers has reached out with a request to migrate media assets stored on their old platform’s on-premise NAS storage to a new platform hosted on AWS cloud infrastructure. They are now using AWS S3 as the storage solution for their new platform. In this article, we will outline a solution using the AWS DataSync service [1] to transfer the data from the on-premise NAS storage to an AWS S3 bucket. The process involves creating a shared media space on the NAS, mounting it to the server using the SMB protocol, setting up the AWS DataSync service, creating an agent, configuring locations, initiating the migration by creating tasks, and finally, reviewing the task logs to ensure a successful transfer.

Solution Overview

The diagram illustrates the overall solution architecture. On the left, we see the legacy storage system, a NAS device housing a vast collection of media assets. On the right, we have the new platform, hosted on AWS Cloud. Currently, the new platform utilizes a single S3 bucket to store all media assets. A lifecycle policy is in place to automatically transition assets to nearline and archive tiers over time, a topic we’ll explore in a future article.
To migrate media assets from the on-premises NAS storage to an S3 bucket, we’ll leverage a Windows server equipped with VMware Workstation. This server will host a virtual machine (VM) running AWS Data Sync. The VM will synchronize with the public AWS Data Sync service endpoint. This synchronization process will facilitate the transfer of media assets from the NAS storage to a designated watch folder in the S3 bucket.
Figure 1. Solution Overview
This is considered a hybrid infrastructure due to the connection between on-premises and cloud environments. To set up this infrastructure, we must select a suitable connection type. Several options are available:
  1. Internet Connection: The data transfer will occur over the public internet, with the agent connecting to the AWS public service endpoint (AWS DataSync service). This allows connectivity to any DataSync service endpoint.
  2. VPN Connection: This option involves connecting the on-premise NAS storage to AWS via a site-to-site VPN.
  3. AWS Direct Connect: In this case, a Direct Connect (DX) connection must be established between the on-premise environment and AWS in advance. While this method offers a dedicated, low-latency connection, it is considered expensive and is generally recommended for transferring large volumes of data.
For simplicity, we will use a public internet connection for the data transfer between our on-premises environment and AWS.

Overview of the tasks:

On Prem Side:
1- AWS DataSync deployment:
1–1–0- Go to AWS DataSync service on the AWS console, and choose “between on-premises storage and AWS” as a data transfer task.
1–1–1- Download the agent [2], in my case I will download the agent (VMware ESXI) from the DataSync console and deploy it in my storage environment (in my case it is a server mount to the NAS storage):
Figure 2. Hypervisor agent- AWS console (AWS Side)
Then, open the VMware software, and open the downloaded image as the below screenshot shows:
Figure 3. AWS DataSync agent- VM (Agent Side)
For more information [3].
Note: the default username is “admin” and the password is “password”.
2- Network configuration:
2–1–1- VM side:
Before starting to work with the agent, it’s crucial to ensure that the network configurations are correct, as this is the most important step in the deployment process.
First, navigate to the VM’s network settings and configure the network adapter to use NAT. This will allow the VM to connect to the internet and share the host server’s IP and MAC addresses.
After making this change, you should see that the VM has received a new dynamic IP address from the DHCP server within the virtual DHCP server[4].
Figure 4. AWS DataSync- New Dynamic IP address (Agent Side)
Note: There are certain requirements for deploying the agent onto the VM. If any of these requirements are not met, you will see a warning in line 4 of the list above. For more information [5].
2–1–2- Local network side:
You need to ensure that traffic to and from the VM/agent deployed in your local network is allowed. In my case, this was handled at the firewall level, so I had to open specific ports to enable communication between the agent and AWS DataSync. After that, these open ports will allow data to be uploaded and downloaded between the two locations.
For more information, please check [5].
3- Test and Choose the AWS service endpoint:
To test and activate the agent, navigate to the AWS Console, go to the DataSync service, and click on “Create Agent.” Choose “Public Service Endpoint” as the endpoint type. The region will automatically be set to the region currently selected in the AWS Console.
Figure 5- AWS DataSync- Public Service Endpoint (AWS Side)
Choose and type number 2 from the list:
Figure 6- AWS Agents’ list (Agent Side)
Then type number 1 to select the “public endpoints” for your AWS DataSync, or in which region your DataSync is deployed.
Figure 7- AWS DataSync agent- Public service endpoint (Agent Side)
At this point, the agent will begin testing the connectivity to the AWS DataSync public service endpoint.
Figure 8- AWS DataSync agent- Network connectivity checks (Agent Side)
Make sure all the checks pass at this stage. Here, in the above screenshot, we are verifying that the necessary ports are open between the agent deployed on your on-premises server and the AWS DataSync public service endpoint in your AWS Console account.
Note: We will cover the preparations and troubleshooting steps in more detail in a separate section once the deployment is complete.
4- Activate AWS DataSync agent:
Once you’ve confirmed that the connectivity between the on-premises agent and the public endpoint is set up, it’s time to activate the agent and obtain the activation key, which will link the agent to your AWS account.
Before proceeding with activation, however, you’ll need to create a new agent, assign it a unique name, and add tags if you have multiple agents for easier identification.
Figure 9- AWS DataSync service- Create an new agent (AWS Side)
To activate the agent [6], go to the AWS Console, navigate to the DataSync service, and click on “Create Agent.” In the “Activation Key” section, choose the automatic method, then enter the agent’s private IP address in the “Agent Address” textbox.
Figure 10- AWS DataSync service- Activation key (AWS Side)
Next, click on “Get Key.” After a few seconds, you’ll receive the activation key along with the public IP of the AWS DataSync service endpoint.
To check the agent’s status, go to the “Agents” section in the left menu. You should see that the status is displayed as “Online,” as shown in the screenshot below.
Figure 11- AWS DataSync service- Agents’ information (Agent Side)
At this point, if the activation process was successful, it indicates that the network configuration and requirements are correct. If the activation didn’t complete successfully, you’ll need to revisit and troubleshoot the issues (refer to steps 1 and 6 from the troubleshooting list). In my case, I encountered several issues, which I’ve detailed later in this article.
Once the activation is complete, every time you log in to the VM/agent, you’ll see three new text indicators showing that your agent is activated and successfully synced with your AWS account, as shown in the screenshot below.
Figure 12- AWS DataSync service- Activation confirmation (Agent Side)
Note: You can verify this by comparing the activation keys in the AWS Console with those displayed on the agent VM. They should match exactly.
What’s next? You might be wondering who will take care of your agent after deployment and configuration. Here’s the good news: once you activate an AWS DataSync agent, AWS manages the virtual machine (VM) appliance for you. This means you don’t need to worry about ongoing maintenance or monitoring; AWS handles the management and upkeep of the agent.
5- Network connection verification:
To make it easier for you, especially if you don’t have a networking background, there are two sides you need to check for network connections. These two sides are as follows:
1–5–1- Between DataSync agent and the on premise storage:
  • Check the connectivity and open ports using the “Test Connectivity to Self-Managed Storage” option, or type number 3 in the list in the screenshot above. This will help you verify if the connection between your on-prem agent and AWS DataSync service is properly established.
1–5–2- Between DataSync agent and AWS DataSync public endpoint service:
  • Check the connectivity and open ports using the “Test Network Connectivity” option, or type number 2 in the list in the screenshot above. This will help you verify if the connection between your on-prem agent and AWS public service endpoints is properly established. Or you can check the connectvity with AWS console/account directly using “command prompt” option or type number 6, the type the below command:
Open support channel
Well, you’ve just completed the most challenging part of the deployment! The next steps will take place in your AWS account. Now, let’s jump into the AWS Console and continue our journey together. We’ll start by creating the locations (both on-premises and cloud S3 storage), setting up the tasks, and finally, begin syncing and transferring data between the two locations.
AWS Side:
6- pre-requisites:
6–1- IAM Role: DataSync needs access to the S3 bucket that you’re transferring to or from. To do this, you must create an AWS Identity and Access Management (IAM) role that DataSync assumes with the permissions required to access the bucket [7].
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AWSDataSyncS3BucketPermissions",
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:ListBucketMultipartUploads"
],
"Resource": "arn:aws:s3:::on-prem-khatib-s3",
"Condition": {
"StringEquals": {
"aws:ResourceAccount": "AWS_Account_ID"
}
}
},
{
"Sid": "AWSDataSyncS3ObjectPermissions",
"Effect": "Allow",
"Action": [
"s3:AbortMultipartUpload",
"s3:DeleteObject",
"s3:GetObject",
"s3:GetObjectTagging",
"s3:GetObjectVersion",
"s3:GetObjectVersionTagging",
"s3:ListMultipartUploadParts",
"s3:PutObject",
"s3:PutObjectTagging"
],
"Resource": "arn:aws:s3:::on-prem-khatib-s3/*",
"Condition": {
"StringEquals": {
"aws:ResourceAccount": "AWS_Account_ID"
}
}
}
]
}
6-2- IAM Role: DataSync needs access to the S3 bucket n order to write the task report to your S3 bucket. So that you must create an AWS Identity and Access Management (IAM) role that DataSync assumes with the permissions required to access the bucket that you select in any region [7].
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"s3:PutObject"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::on-prem-khatib-s3/*",
"Condition": {
"StringEquals": {
"s3:ResourceAccount": "AWS_Account_ID"
}
}
}
]
}
6-3- AWS S3 Bucket: Create an S3 bucket that will serve as the destination for your data transfer (in my case, I’m using S3 as the storage location; for you, it could be a different AWS storage service, like EFS or FSx).6-4- On-Prem Shared Folder: Create a shared folder on your on-prem storage and mount it to your server using the SMB protocol. If you’re using a different protocol (e.g., NFS), the process will be similar but with the respective protocol.
7- Create locations:
First, it’s important to understand that by “location,” we mean the source and destination where you’re copying data to and from. There are several options available for data transfer, which you can explore [here], where you’ll find more information about transferring data between on-prem systems and various AWS storage services like S3, EFS, and FSx [8] [9].
Now, let’s focus on our use case: transferring data between our on-prem storage and an AWS S3 bucket in the Frankfurt region. In other words, we will be transferring data between your SMB storage/server and AWS S3. To do this, we need to create the necessary locations for our deployment. Hence, We need to create two locations: one for the on-premises storage and the other for the AWS S3 bucket. To create the source and destination locations, follow these steps:
7–1 For the AWS S3 Bucket Location:
  1. In the AWS DataSync console, on the left navigation pane, select Locations.
  2. On the Locations page, click Create Location.
  3. Choose S3 as the location type.
  4. Enter the S3 URL and select the folders inside the bucket. In our case, we’ve created a watch-folder within the bucket.
  5. Select the appropriate storage class for your data.
  6. Finally, choose the IAM role that should be created to allow DataSync to access the S3 bucket.
Once you’ve completed these steps, your S3 bucket location will be ready for use.
Figure 13- AWS DataSync service- Locations (AWS side)
7–2 For the On-Prem Storage Location:
  1. Location Type: Create a new location and select SMB as the location type, since we will be using the SMB protocol between the on-premises server and the NAS storage.
  2. Agent: Select the agent that we previously created on the on-prem server, which is already synchronized with the DataSync service in your AWS account.
  3. On-Prem Storage IP: Enter the SMB server IP. In our case, this will be the private IP of the on-prem NAS storage where the shared folder is hosted.
  4. On-Prem Shared Folder (Mounted Folder): Enter the name of the shared folder. This is the folder that has been mounted and will be used for data transfer.
Once you’ve completed these steps, your on-prem storage location will be set up and ready for syncing with the AWS S3 bucket.
Figure 14- AWS DataSync service- Locations (On premises side)
5- User: type the admin username/account for the NAS storage.
6- Type the password of the admin account.
Figure 15- AWS DataSync service- Locations (On premises side)
After completing the creation of both locations (on-prem storage and AWS S3 bucket), navigate to the Locations page in the AWS DataSync console. You should now see two locations listed:
A. On-prem storage location — This is the location representing your on-premises server (or NAS storage) that you’ve set up using the SMB protocol.
B. AWS S3 bucket location — This is the location pointing to your S3 bucket, where the data from your on-premises storage will be transferred.
These locations indicate the source (on-prem storage) and the destination (S3 bucket) for your DataSync tasks.
Figure 16- AWS DataSync service- Locations
In the screenshot above, each location has the following details:
  1. Path: Shows the location path. For S3, it’s the bucket name; for on-prem, it’s the shared folder path.
  2. Host: Displays the host info. For S3, it’s the bucket; for on-prem, it’s the private IP of the server.
  3. Location Type: Indicates whether the location is S3 or SMB (for on-prem storage).
  4. Tasks: Shows the number of tasks (upload/download) assigned to each location.
This page provides an overview of your locations, their settings, and active tasks.
Note: AWS DataSync allows you to sync data directly to the different S3 storage classes, so you don’t need to first drop your data into the Standard/Online class and then wait 30 days to move it to another class. You can choose the desired S3 storage class during the transfer setup.
8- Create task:
Note: First, it’s important to understand that by task, we mean a data transfer operation between two created locations. For each task, you need to select a source and a destination. Please note that one location can serve as the source in one task and as the destination in another task. This flexibility is essential for efficiently managing data transfer between multiple locations [10].
Now, let’s start creating tasks to test our workflows:
8-1- Workflow/task 1: Transfer the files from the on prem NAS storage to S3 watch folder bucket.
8–1–1- Configure source location:
Figure 17- AWS DataSync service- Locations (source)
8–1–2- Configure destination location:
Figure 18- AWS DataSync service- Locations (destination)
8–1–3- Configure settings: Add the task’s name:
Figure 19- AWS DataSync service- Task
8–1–4- Transfer options:
Figure 20- AWS DataSync service- Task
8–1–5- Schedule: This is an important consideration if you need to migrate data at a specific time or over a period of time. Scheduling tasks allows you to control when the data transfer starts and how often it runs, making it ideal for periodic migrations or low-traffic hours [12].
Figure 21- AWS DataSync service- Task
8–1–6- Task report: If you want to receive a report once the task is finished, and it will be stored on a specific bucket [11].
Figure 22- AWS DataSync service- Task Report
8–1–7- Logging: If you want to receive and store the logs for all tasks, you need to create a CloudWatch log group.
Figure 23- AWS DataSync service- Logging
8–1–8- The task has been created and is now available, as shown in the screenshot below:
Figure 24- AWS DataSync service- Data Transferring
8–1–9- Start the task:
To initiate the migration task, click on the ‘Start’ button:
Figure 25- AWS DataSync service- Data Transferring
8–1–10- Launch the task: The task has started.
Figure 26- AWS DataSync service- Data Transferring
8–1–11- Transfering: Now, the transfer process has started, and data is being uploaded from the on-prem storage to the S3 watch folder.
Figure 27- AWS DataSync service- Data Transferring
8–1–12- Task completed: The task is complete. Let’s go to the S3 watch folder and verify if both locations have been successfully synced.
Figure 28- AWS DataSync service- Data Transferring
The screenshot below shows that the source and destination locations are synchronized and contain the same files, indicating that our workflow is functioning correctly.
Figure 29- AWS DataSync service- Data Transferring
8–1–13- Check the task report:
I’ve downloaded the report, and it shows the following information, confirming that 7 files have been successfully transferred to the S3 watch folder.
For more information [13].
Figure 30- AWS DataSync service- Task Report Summary
8–1–14- Check the logs: We’ve already created an AWS CloudWatch log group (/aws/datasync) to send all task logs to CloudWatch. To check the logs:
  1. Go to the CloudWatch page in the AWS Console.
  2. Navigate to the Logs section and select the Log Groups pane.
  3. The screenshot below shows the log group we created, where you can review the logs for the task.
Figure 31- AWS DataSync service- Data Transfers’ logs
I have exported this data to an Excel file for better clarity. As you can see, it shows the 7 files that have been successfully uploaded to the S3 watch folder.
Figure 32- AWS DataSync service- Data Transfers exported logs
2–1- Workflow/task 2: Transfer the files from S3 watch folder bucket to the on prem NAS storage.
For the readers, this workflow follows the same steps as Workflow 1, but with the source and destination locations swapped. In this workflow, the source location will be the S3 watch folder, and the destination will be the on-prem NAS shared folder.

Networking requirements:

In order for the AWS DataSync service to connect to the on-prem agent VM, there are some network requirements that the network engineer needs to consider. Below, I’m listing the required ports that need to be opened, along with a diagram that explains the connections in detail [5].
- HTTP 80 (Activation Key).- HTTPs 443 (Agent bootstrap- Activation- DataSync Public endpoint- agent update).-DNS 53.- SSH 22 (Allow AWS support to access your agent).- NTP 123 (to sync the VM with the server host time).
These ports are necessary to ensure that the DataSync agent can communicate with both the AWS service and your on-prem storage.
Figure 33- AWS DataSync service- Network requirements
This should help the network engineer ensure that all the necessary ports are open and traffic flows properly between your on-prem environment and AWS.

Troubleshooting:

I’ve added this section because I encountered several issues during my deployment before writing this article, and I want to share the lessons learned with you. By outlining these network requirements and common pitfalls, I hope to help you avoid similar issues if you need to work with AWS DataSync services in the future [15].
A. The VM/AWS DataSync agent cannot speak to the outside world:
Solutions:
  • Check the VM’s Network Card: Ensure that the network card for the VM is set to NAT in the VM settings. This allows the VM to access the internet through the host machine’s network connection.
  • Check the DNS Settings on Your VM: Verify the DNS settings on the VM. Use the recommended DNS server (number 1 from the network configuration list) to ensure proper name resolution and connectivity to AWS DataSync services, as the below screenshot shows:
Figure 34- AWS DataSync service- DNS settings
  • hen, try to ping 8.8.8.8 (Google’s public DNS) to verify external network connectivity.
Figure 35- AWS DataSync service- Ping command
B. The VM/AWS DataSync agent cannot sync/connect to the AWS DataSync public service endpoints:
Solution: Check that your firewall is allowing the required ports.
C. The VM/AWS DataSync agent cannot speak to the internal/local storage:
Solution:
  • Check the following ports if they are open b/w your VM agent and the local NAS storage:
  • Then, type the number 3 of the list, below I am testing the SMB connection:
Figure 36- AWS DataSync service- SMB conntectivity
D. The Timing on the VM/AWS DataSync agent is not correct.
Solution:-Type number 5 of the list (NTP) and check the timing:
Figure 37- AWS DataSync service- NTP configuration
E. The SSH connection with AWS account is not working.
Solution:
  • Type number 6 (command prompt) and type the following command:
Open support channel
  • If there is an active SSH connection, you should see the following output:
Figure 38- AWS DataSync service- SSH connectivity

DataSync deletion:

To delete the agent used by AWS DataSync, you must first delete the locations. If you try to delete the agent before removing the locations, you will receive the error message shown in the screenshot below
Figure 39- AWS DataSync service- Deletion

Cost considerations:

DataSync offers simple, predictable, pay-as-you-go pricing. You pay a flat, per-gigabyte fee for the amount of data transferred between your storage locations. The cost depends on your workflow, which storage services you are using, and the connectivity method between your on prem and AWS storage service or between two AWS regions. However, the pricing for the DataSync service itself is based only on the amount of data being transferred. You only pay for the data that is actually moved between locations. For more information about the cost check [14].

Conclusion:

In this comprehensive article, we walked through the process of using the AWS DataSync service to migrate your media assets and data from your on-premise NAS storage to an AWS S3 bucket. The steps included setting up the DataSync service, downloading and configuring the agent, and verifying network connectivity between the on-premise mounted NAS storage (via SMB protocol) and the DataSync public endpoint IP. We then created locations for both the on-premise mounted folder and the S3 bucket, followed by creating a migration task to transfer the data to S3. Finally, we reviewed the summary report and logs using the AWS CloudWatch service to monitor the progress and ensure a successful migration.

Next steps:

This article is part of a series that covers various workflows on a hybrid infrastructure, where we process data in multiple stages. The next step involves running a security scan on the data uploaded to the AWS S3 bucket. This ensures that all uploaded assets are secure and safe for later download by on-premises or end users.

Resources:

Comments