Using Finch to run Apache Airflow using mwaa-local-runner
I show you how you can use the Finch to run Apache Airflow using the mwaa-local-runner tool, and how you can do this for your applications too
- finch vm init
- finch vm start
1
2
3
4
finch vm start
INFO[0000] Starting existing Finch virtual machine...
INFO[0025] Finch virtual machine started successfully
- docker run = finch run - the way we run container images in Finch just changes docker for finch, easy peasy!
- docker build = finch build - wow, this is too easy
- docker compose up = finch compose up - a very straight forward swap, although this was the one I was most concerned about, but I should not have worried though, as the project docs had me covered
- --compress - this command line option for Docker was not supported in the current version of Finch that I was using, so I removed this. Removing it was ok, and the build worked. That does beg the question, what does --compress do, and should I be concerned? Reading this document it looks like --compress helps you improve the build performance. I am not too worried, as I am not planning on building these images frequently, so I think I can live without this option.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
finch build --rm -t amazon/mwaa-local:2_7 ./docker
[+] Building 440.9s (25/26)
[+] Building 441.1s (26/26) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 1.73kB 0.0s
=> [internal] load metadata for docker.io/library/amazonlinux:2023 1.4s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [ 1/21] FROM docker.io/library/amazonlinux:2023@sha256:d8323b3ea56d286d65f9a7469359bb29519c636d7d009671ac00b5c12 5.6s
=> => resolve docker.io/library/amazonlinux:2023@sha256:d8323b3ea56d286d65f9a7469359bb29519c636d7d009671ac00b5c12dd 0.0s
=> => sha256:d111cbc02b249a552b77e87298e3df2ce29173bc39b7d82aecba5ca8a2ab06d2 51.32MB / 51.32MB 4.5s
=> => extracting sha256:d111cbc02b249a552b77e87298e3df2ce29173bc39b7d82aecba5ca8a2ab06d2 1.0s
=> [internal] load build context 0.0s
..
..
=> [21/21] WORKDIR /usr/local/airflow 0.0s
=> exporting to docker image format 78.7s
=> => exporting layers 61.8s
=> => exporting manifest sha256:76739ac599da52b352158076e802f1331eb61c385fdecf20cc0f36728e753478 0.0s
=> => exporting config sha256:d2afaf67f1dd0022006d153aae0a55d98fbf9fa82e0734386ef613da50d255d2 0.0s
=> => sending tarball 16.9s
Loaded image: docker.io/amazon/mwaa-local:2_7
1
2
3
4
5
6
7
8
9
10
11
finch compose -p $PROJECT_NAME -f ./docker/docker-compose-local.yml up
WARN[0000] Ignoring: service local-runner: [EnvFile HealthCheck]
WARN[0000] Ignoring: service local-runner: depends_on: postgres: [Required]
INFO[0000] Ensuring image postgres:11-alpine
INFO[0000] Ensuring image amazon/mwaa-local:2_7
INFO[0000] Re-creating container aws-mwaa-local-runner-2_7-postgres-1
INFO[0000] Re-creating container aws-mwaa-local-runner-2_7-local-runner-1
INFO[0000] Attaching to logs
..
..
1
2
3
4
5
6
7
8
postgres-1 |chown: /var/lib/postgresql/data: Permission denied
postgres-1 |chown: /var/lib/postgresql/data/pg_multixact: Permission denied
local-runner-1 |Mon Feb 12 13:08:13 UTC 2024 - postgres:5432 still not reachable, giving up
INFO[0183] Container "aws-mwaa-local-runner-2_7-local-runner-1" exited
INFO[0183] All the containers have exited
INFO[0183] Stopping containers (forcibly)
INFO[0183] Stopping container aws-mwaa-local-runner-2_7-postgres-1
INFO[0183] Stopping container aws-mwaa-local-runner-2_7-local-runner-1
1
finch volume create pgdata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
version: '3.7'
services:
postgres:
image: postgres:11-alpine
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
- PGDATA=/var/lib/postgresql/data
logging:
options:
max-size: 10m
max-file: "3"
volumes:
- pgdata:/var/lib/postgresql/data:rw
local-runner:
image: amazon/mwaa-local:2_7
restart: always
depends_on:
- postgres
environment:
- LOAD_EX=n
- EXECUTOR=Local
logging:
options:
max-size: 10m
max-file: "3"
volumes:
- "/Users/ricsue/Projects/airflow-101/workflow/dags:/usr/local/airflow/dags"
- "/Users/ricsue/Projects/airflow-101/workflow/plugins:/usr/local/airflow/plugins"
- "/Users/ricsue/Projects/airflow-101/workflow/requirements:/usr/local/airflow/requirements"
- "${PWD}/startup_script:/usr/local/airflow/startup"
ports:
- "8080:8080"
command: local-runner
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
env_file:
- ./config/.env.localrunner
volumes:
pgdata:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
local-runner-1 |Mon Feb 12 13:19:58 UTC 2024 - waiting for Postgres... 1/20
postgres-1 |sh: locale: not found
postgres-1 |2024-02-12 13:19:58.329 UTC [31] WARNING: no usable system locales were found
postgres-1 |The files belonging to this database system will be owned by user "postgres".
postgres-1 |This user must also own the server process.
postgres-1 |
postgres-1 |The database cluster will be initialized with locale "en_US.utf8".
postgres-1 |The default database encoding has accordingly been set to "UTF8".
postgres-1 |The default text search configuration will be set to "english".
postgres-1 |
postgres-1 |Data page checksums are disabled.
postgres-1 |
postgres-1 |fixing permissions on existing directory /var/lib/postgresql/db/pgdata ... ok
postgres-1 |creating subdirectories ... ok
postgres-1 |selecting default max_connections ... 100
postgres-1 |selecting default shared_buffers ... 128MB
postgres-1 |selecting default timezone ... UTC
postgres-1 |selecting dynamic shared memory implementation ... posix
postgres-1 |creating configuration files ... ok
postgres-1 |running bootstrap script ... ok
postgres-1 |performing post-bootstrap initialization ... ok
/lib64/ld-linux-x86-64.so.2: No such file or directory
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o $zip_file
1
2
3
4
5
if [[ $(uname -p) == "aarch64" ]]; then
curl "https://awscli.amazonaws.com/awscli-exe-linux-aarch64.zip" -o $zip_file
else
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o $zip_file
fi
No module named 'MySQLdb'
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.