Back Up On-Premises Data Incrementally to Amazon S3 over SFTP using AWS Transfer Family
Business often want to securely backup on-premises data to the cloud using familiar SFTP protocol with a static IP for SFTP server. This blog demonstrates how to use AWS Transfer Family to create an SFTP endpoint which can be exposed via a static IP, enabling incremental backups from your local servers to Amazon S3.
- Allocate an Elastic IP, you can follow steps mentioned in section Allocate an Elastic IP address
- Create an Amazon S3 bucket to store your data. You can follow steps mentioned in Creating a bucket.
- Create an IAM role for Amazon S3 access. You can follow the steps mentioned in Create an IAM Role and Policy.
- For demonstration I will use Amazon EC2 instance as the source server from where I will upload data to Amazon S3. If you already have a server you can use that to send your data to Amazon S3 otherwise you can follow steps mentioned in the tutorial to launch an EC2 instance.
- Follow the steps mentioned in Generate SSH keys to get a private and public key pair which will be used in further steps.
- Open AWS Transfer Family console.
- In the navigation pane, click Servers then click Create serverImage not found
Create Server - On the next page, choose SFTP (SSH File Transfer Protocol) - file transfer over Secure Shell, click next
- On the next page, for Identity Provider for SFTP, FTPS, or FTP choose Service managed
- For Endpoint configuration, configure the following inputs:
- Choose VPC Hosted as Endpoint type.
- Custom hostname: None, you can also use Amazon Route53 DNS alias or other DNS if required
- Choose a VPC which also spans across the availability zone selected for Elastic IP.
- Image not found
Configure endpoint parameters - Choose the availability zone and elastic IP address created earlier. Then, click Next
- Choose Amazon S3 as Domain
- On the next page, keep parameters to default and click Next
- Review the details and click Create server
- Wait for the server status to be online
- Select the created server and click Add user
- Enter Username: 'username'
- Select the IAM role created in step 2 of Prerequisited as the IAM role while adding user.
- For Home Directory, choose the S3 bucket that you have created to store your data.
- For SSH public keys, you need to paste the content of the public key. Paste the content of the Public Key created in the step 5 of prerequisites.
- Click Add
- Connect to your on-premises server or Amazon EC2 instance. You can follow steps for Connecting to Amazon EC2 instance
- I've created a folder app-logs which you will backup to Amazon S3 everyday. You can see the following screenshot having app-logs folder and keys stored under the same directory.
- Create a python script which does the following:
- Established SSH connection with the SFTP server and opens a SFTP session.
- Identifies files which were changed in last 24 hours.
- Performs the file transfer operation over SFTP for the modified files.
- Ensures SFTP session and SSH connection is closed properly.
- Sample script for the above actions:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import os
import paramiko
from datetime import datetime, timedelta
SFTP_HOST = 'your-transfer-family-endpoint or IP address'
SFTP_USERNAME = 'your-sftp-username'
SFTP_PRIVATE_KEY = '/path/to/your/private-key.pem'
LOCAL_DIR = '/path/to/local/data'
REMOTE_DIR = 'backup/directory'
TIME_THRESHOLD = datetime.now() - timedelta(hours=24)
def incremental_backup():
# Set up the SSH client
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
# Load the EC private key
key = paramiko.ECDSAKey.from_private_key_file(SFTP_PRIVATE_KEY)
# Connect to the SFTP server
ssh.connect(SFTP_HOST, username=SFTP_USERNAME, pkey=key)
# Open an SFTP session
sftp = ssh.open_sftp()
try:
for root, dirs, files in os.walk(LOCAL_DIR):
for file in files:
local_path = os.path.join(root, file)
remote_path = os.path.join(REMOTE_DIR, os.path.relpath(local_path, LOCAL_DIR))
if os.stat(local_path).st_mtime > TIME_THRESHOLD.timestamp():
print(f"Uploading {local_path} to {remote_path}")
sftp.put(local_path, remote_path)
finally:
sftp.close()
ssh.close()
if __name__ == "__main__":
incremental_backup()
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.