AWS Logo
Menu

Using Finch to run Apache Airflow using mwaa-local-runner

I show you how you can use the Finch to run Apache Airflow using the mwaa-local-runner tool, and how you can do this for your applications too

Ricardo Sueiras
Amazon Employee
Published Feb 12, 2024
Last Modified Feb 18, 2024
As some of you may know, I have been creating content on Apache Airflow for a few years now. One of the open source projects that AWS has produced to make it easier for developers to get started with Apache Airflow, is mwaa-local-runner. If you have seen me at an event, it is likely you will have seen my live coding/demos, where I use this project. It is awesome!
The thing is, I have just got myself a new Mac (M1 as you asked so nicely), and as I was installing software that I needed, I decided that rather than re-install Docker and Docker Compose, I would make time to get to know Finch. What is Finch I can hear you all asking. Finchis another open source project from AWS that provides a command line client for building, running, and publishing Linux containers. As I prepare for my talk later this week at a local developer conference, I thought I would put together a quick post on how I updated mwaa-local-runner to use Finch rather than Docker.

Installing Finch

Installing Finch was very straightforward thanks to the excellent documentation that the community has put together.
With my previous laptop when I was using Docker, I had to start the Docker demon so that I can run the various Docker commands. With Finch, we have to kind of do something similar as we need resources in which to run our containers. This is outlined in the docs very nicely, and I had to run the following commands:
  • finch vm init
  • finch vm start
Here is the output of the second command, which tells me that we are ready for action.

Finch compatibility

mwaa-local-runner uses a bash script to manage all the various activities that it does (building, starting, cleaning container images) and so the first port of call was to review the script, to see what docker commands it was running, and then review these against the Finch command line reference guide. Luckily, compatibility with switches within Finch makes it super easy to migrate your containerised build scripts.
Going through this there were a few commands that it looked like I needed to change.
  • docker run = finch run - the way we run container images in Finch just changes docker for finch, easy peasy!
  • docker build = finch build - wow, this is too easy
  • docker compose up = finch compose up - a very straight forward swap, although this was the one I was most concerned about, but I should not have worried though, as the project docs had me covered
Containerized applications composed of multiple services are often defined in Docker Compose files. Finch offers a CLI that is compatible to the docker compose cli, therefore commands that you have used previously like docker compose up could be translated to finch compose up.
As I was looking at some of the commands that were used by the script, many were using command line switches. I wanted to review these to make sure that Finch also supported these. As it turned out, one command line switches used was not supported by Finch.
  • --compress - this command line option for Docker was not supported in the current version of Finch that I was using, so I removed this. Removing it was ok, and the build worked. That does beg the question, what does --compress do, and should I be concerned? Reading this document it looks like --compress helps you improve the build performance. I am not too worried, as I am not planning on building these images frequently, so I think I can live without this option.

Running my updated script

Rather than modify the existing script, I created a new one (finch-mwaa-local-runner) and then made my changes. Before I kicked this off though, I went through the specific commands within the script, and ran them in a terminal to make sure they worked.
Building the container image worked a treat, and I had no errors when running this.
The next test was to actually kick off and run the containers.
Eventually, the start process failed with the following:
I was getting hundreds of permissions errors. Oh no, I knew this was going too well. Looking at the current issues within the Finch GitHub repo, I found an issue that I thought would help resolve this problem. Looking at this I created a new docker-compose file to take into consideration some of the comments, as well as adding a new step.
First I needed to create a volume with the Finch cli
And then modify the docker file. I used this opportunity to rename my configuration files. I created finch-local.yml and finch-resetdb.yml, and this is what they looked like:
mwaa-local-runner now runs on Finch
Re-trying this showed that it looked good to go now.
I was then greeted with the familiar Airflow ascii graphics that showed me that I was good to go. Testing in a local browser confirmed that I was now running mwaa-local-runner using Finch.

Fixing the bootstrap.sh

As is always the way, just when you think you have cracked it, a problem appears. As it turns out, when I went to test a simple DAG (one that calls the AWS cli, doing an aws sts get-caller-identity command) the task failed with the following error:
/lib64/ld-linux-x86-64.so.2: No such file or directory
Initially when I looked at this error, something was off - I am running on an aarch64 not an amd64 processor. Searching for possible answers took me down several rabbit holes and wasted a lot of time before I realised what it was. The current bootstrap.sh script that is used when building the Airflow container contained the following entry:
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o $zip_file
So it was always trying to install amd64 binaries, despite me building this on an aarch64. To fix this, I modified the script as follows:
So that whether I am using an intel based or arm based system, it will pick up the right AWS cli to install.

MySQL Provider

The next issue I bumped into was that when trying to run a task using the MySQL Operator, I encountered the following error:
No module named 'MySQLdb'
This time searching provided more helpful, sort of. In this I found out that using the MySQL Operator on my local aarch64 based mac was not going to work. There was probably some work I could do to work around this, but it seemed a better approach to switch to using PostgreSQL instead.

Accessing the local host

The final thing that I needed to figure out was how to access services that were running on my local machine. Docker surfaces up host.docker.internal which you can use within processes within your container to connect to external services running on the host (i.e. my mac). It took my a while to find this, but when using Finch, you can do the same thing by using 192.168.5.2.

Open Source is awesome!

I hope this post was useful, and that for those of you who are looking to use open source tools like Finch to manage your container development processes, you will get some ideas of how easy it can be. I have created a GitHub repo that shares the configuration files I used to get mwaa-local-runner to work with Finch.
If you found this post useful, please provide me feedback. I use this feedback to help me improve my content, but also to know what content to write about. Thank you so much!
---
Image created using Amazon Bedrock and Stability SDXL using the following prompt "can you create an 80s pixel based image that features a picture of a bird, specifically a Finch. In the background is a serene jungle landscape."
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments