Moving Data with Mage on AWS ECS
In this blog, I will review how to deploy a Mage Docker container with Amazon Elastic Container Service (ECS) on AWS Fargate and Amazon Elastic Container Registry (ECR).
Published Dec 29, 2023
According to mage.ai: "Mage is an open-source data pipeline tool for transforming and integrating data. Mage is a modern replacement for Airflow." My first blog post 'Airflow is not an ETL tool…' brought additional questions about data orchestration and data orchestration tools such as, are there something like pure data orchestrators that use "different tools and technologies together to move data between systems"? Are data orchestrators aka connectors that connect different tools to extract, transform, and load the data? Why do people use the terms data pipelines and data orchestration interchangeably? Are they the same?
While browsing the Web researching on the topic I came across the article 'Introducing SQLake: Data Pipelines Without Manual Orchestration' from Upsolver that in my opinion, merges these two concepts beautifully, "Simply put, every data pipeline is composed of two parts: Transformation code and orchestration. If you run daily batches, orchestration is relatively simple and no one cares if a batch takes hours to run since you can schedule it for the middle of the night. However, delivering data every hour or minute means you have many more batches. Suddenly auto-healing and performance become crucial, forcing data engineers to dedicate most of their time to manually building Direct Acyclic Graphs (DAGs), in tools like Apache Airflow, with dozens to hundreds of steps that map all success and failure modes across multiple data stores, address dependencies, and maintain temporary data copies that are required for processing." This paragraph made sense to me why Mage uses the term 'data pipeline' AND 'the modern replacement for Airflow' in the same sentence.
Containerize everything
As I was just getting started with Mage, I ran Docker on my computer but the next step for me was to bring Mage and my pipelines to the Cloud as currently, Mage doesn't have a cloud edition. You have several options to configure and deploy Mage: (1) deploy an AWS ECS cluster with Terraform; (2) AWS EC2 instance; (3) deploy an AWS ECS cluster manually. As it was a side project to stimulate the deployment of the Mage application, I opted for the manual creation of ECS.
1. Docker Desktop. In my case, I used the Windows Installation following this guide from Docker.
2. An AWS IAM user for both console and programmatic access with 'AmazonECS_FullAccess' policies attached.
2.1 Open the AWS Identity and Access Management (IAM) page and find 'Users' on the left-hand side >> Create User.
2.2 Specify a user name >> Next.
2.3 Click Permissions options >> Attach policies directly and find 'AmazonECS_FullAccess' and 'AdministratorAccess' on the list >> Create user.
2.4 Select the user that you've just created and open 'Security credentials' >> 'Create access key' and save Access key ID and Secret access key somewhere.
3. AWS CLI on Windows using this guide from AWS.
3.1 Once you run the AWS CLI MSI installer, open the command prompt and confirm the installation with
aws --version.
Then, type aws configure
and add Access key ID and Secret access key you noted from the previous step.4. An IAM role that permits AWS ECS to make API calls on your behalf - more here.
4.1 Go to AWS IAM and find 'Roles' on the left panel >> 'Create role' .
4.2 Select 'Custom trust policy' in the 'Trusted entity type' section and paste the following policy:
4.3 Then, in the Permissions policies, find the policy named 'ecsTaskExecutionRole' >> Next.
4.4 Add a name e.g. ecsExecutionRole >> 'Create role'.
Step 1. Prepare a Dockerfile, a file with the instructions to build an image to run a Docker container with the Mage code. To create the Dockerfile - see my version here - I used the Mage's Dockerfile as a foundation, changing
[project_name]
in ARG PROJECT_NAME
with my project name: ARG PROJECT_NAME=[demo_mage].
If you've already played with Mage on your local machine, you know that authentication is not enabled by default. To secure my Mage portal, I added the following line to the Dockerfile to enable the default username (admin@admin.com) and password (admin) for the initial login with the token valid for 1440 seconds - read more on other auth methods here .
ENV REQUIRE_USER_AUTHENTICATION=1
ENV MAGE_ACCESS_TOKEN_EXPIRY_TIME=1440
Start the Docker Desktop, change to the directory where you have the Dockerfile stored, and run the command prompt from there to build a Docker image from your Dockerfile:
docker build -t demo_mage .
Once you have it, run the newly created Docker image as a container with
docker run -it -p 6789:6789 demo_mage
. To view the Mage front end, go to http://localhost:6789 to get redirected to the login page. Success! It is time to move from running Mage on my local machine to running it remotely.Step 2. To run a Docker image remotely, I need to store it in a Docker registry. For simplicity, I will use Amazon Elastic Container Registry (ECR), a managed container registry on AWS.
Create a private repository on AWS ECR, e.g. name it 'demo-mage'. To simplify your life, select the repository >> View push commands and run the commands to authenticate Docker to the Registry as an AWS user, build, tag, and push your image to the AWS ECR. A comment here, though, I was unable to make the commands work using Windows PowerShell, hence I ran a different format of the command on the AWS CLI:
Step 3. To launch Docker container services on AWS, you would need to create an AWS ECS cluster on Fargate aka Docker orchestration service, and an ECS task to pull and run our Docker image from ECR.
Create an ECS Fargate cluster named 'demo-mageai' by visiting >> Create Cluster >> 'AWS Fargate (serverless)'.
From the right-hand side panel, select 'Tasks' >> Create task definitions with JSON. In the editor window, added the following specification - please adjust the parts in <> as per your needs:
To run the ECS task for the first time, from the same Task window, click Deploy >> Run task >> Create. The container will start provisioning. Once it is in the running stage, click the task name >> Network >> Open address to log in to the Mage portal.
Step 4. Once you log in to the Mage frontend using the default username and password, change the password in Settings >> Users by selecting the 'admin' user and updating the 'hand side' section with the new password. You might also want to sync with the GitHub repository to store your pipeline code.
I got at peace with the orchestration topic. There are some remaining pieces and blank areas but most 'pipes' fit together now.