Updating legacy Apache Airflow DAGs with Amazon Q Developer

Updating legacy Apache Airflow DAGs with Amazon Q Developer

Find out how to use the Airflow Amazon Provider package with Amazon Q Developer workspace context awareness to update legacy Apache Airflow workflows

Ricardo Sueiras
Amazon Employee
Published Jul 15, 2024
Last Modified Jul 16, 2024
Over the past 18 months I have spent a lot of time working on Apache Airflow. One of the topics that comes up again and again in conversations with users, is the latency when upgrading to the latest versions of Airflow. When I ask what is stopping folks from upgrading, the most common and consistent response has been that using newer versions of Airflow break their existing workflows (DAGs). This is actually not that surprising, as development and innovation in the Apache Airflow project has meant that DAGs that worked on earlier versions are likely to need some work. If we could help address this issue, then maybe it would unlock customers so they can upgrade to newer versions of Airflow.
With that in mind, I wanted to see whether the new capability of Amazon Q Developer, might be able to help make updating some of my old DAGs that do not work in the latest versions of Apache Airflow. This new capability allows you to use local files in your developer IDE, which Amazon Q Developer will then index and then provide additional workspace context which it can use to help provide better code suggestions.
From Airflow 2.x onwards, Airflow Operators were separated out into discrete provider packages. The Amazon Provider package for example, has all the Operators you will need to interact with AWS services. These are updated on a regular basis, and parameters and configurations used by the Operators can change, so you need to keep up to date with these updated packages.
In this post, I am going to show how I was able to take one of the most recent Airflow Amazon Provider package (v.8.24 at the time of me writing the post, v.8.25 is now out), and update some of my 'legacy' DAGs that no longer work in later versions of Airflow.
Exploring my current DAG
I have a number of older DAGs that I put together when the Managed Workflows for Apache Airflow (MWAA) service was launched. This is an example of one, which uses a number of AWS services to run a typical workflow. When trying to run this on Apache Airflow 2.8.1, I get errors such as:
This is not a particular complex workflow, but I need help to automate the updating of this to work with the newer versions of the Operators that are part of the Amazon Provider package.
Setting up Amazon Q Developer
In a fresh directory, I download the Amazon Provider package files and then uncompress them. This is what it looks like:
I also copy my legacy DAG into a local folder called DAGs, so my workspace looks like
I am running MWAA local runner, which is a local, developer tool that is configuration compatible with MWAA. I am running version 2.8.1, which will allow me to just drop my DAGs and validate/test them quickly.
After getting all my local files ready, the next thing I need to do is ensure that I am running the latest version of the Amazon Q Developer plugin. As of writing, this is v.1.14.0.
I then have to enable Amazon Q Developer to be able to index our files. You do this by clicking on the cog icon and then selecting the Extension Settings, which will open up the following screen. If you do not see all the items, then either you are using an older version of the Amazon Q Developer plugin, or you might need to restart your IDE (this happened to me the first time I updated it)
overview of new amazon q developer settings screen
You will need to ENABLE (tick) the option "Amazon Q: Workspace Index" - you can leave the other options as they are for the time being. Once you do this, you will be able to see the index being built by checking out the new option available from the OUTPUT tab from your terminal section, called "Amazon Q Language Server"
Amazon Q Developer Workspace local index being built.
It has indexed 243 files, which is about right (when I run "find . -type f | wc -l" it reports back 248, and when I review the files, it looks like there are five markdown files, so it has ignored those and just indexed the Python files)
You can also see output in the Amazon Q Logs that show you its working and has started ok.
That is all I have needed to do, so now its on with seeing how this new capability of Amazon Q Developer can help me.
Using Amazon Q Developer Workspace to update my DAG
To invoke this new capability of Amazon Q Developer, we use the "@workspace" command.
example of invoking the workspace feature in vscode
I try with the following prompt.
@workspace can you update the legacy-dag.py so that it uses the Amazon Airflow Provider package 8.24.0 to update existing Amazon related Operators. This will be deployed in Airflow 2.8.1, and this current DAG fails as many of the Opertors are out of date.
From the Amazon Q Developer logs, I can see that Amazon Q Developers is referencing the local files
searching workspace context for query: can you update the legacy-dag.py so that it uses the Amazon Airflow Provider package 8.24.0 to update existing Amazon related Operators. This will be deployed in Airflow 2.8.1, and this current DAG fails as many of the Opertors are out of date.
Fetched context from apache_airflow_providers_amazon-8.24.0/airflow/providers/amazon/get_provider_info.py, apache_airflow_providers_amazon-8.24.0/airflow/providers/amazon/aws/log/cloudwatch_task_handler.py, apache_airflow_providers_amazon-8.24.0/airflow/providers/amazon/aws/hooks/base_aws.py
Query done in 48ms
Amazon Q Developer provides me with a bunch of steps to follow. The first one, and the most critical, is to update the import statements for my DAG.
show amazon q developer response
It goes on to suggest updating some of the other tasks within the DAG, but the changes are the same as the existing code, so there is nothing to do. This is something that I did find trying to update a number of different DAGs, Amazon Q Developer will provide additional things for you to do, but sometimes you just need to review and check rather than implement.
I take this updated DAG and then deploy it on my local test Airflow 2.8.1 server. This time I get no errors, and the legacy DAG is now up and running in my Apache Airflow 2.8.1 environment and working correctly.
screenshot of DAG running in mwaa-local-runner 2.8.1
I tried this with a number of other DAGs that I had setup to run in Airflow 1.12 and that failed to load into Airflow 2.8.1, and each time Amazon Q Developer provided the right guidance to help make this an easier task.
Did this work before Amazon Q Developer Workspace?
Some of you might be asking, what was the response from Amazon Q Developer before this new feature was available? The short answer is that Amazon Q Developer's response did not help or provide the right guidance. As an example, for the same prompt above, we get the following response:
response not using amazon q developer workspace
The output did vary, and one of the things I noticed since using the Workspace feature of Amazon Q Developer, is how it uses the local files to provide better responses. Again, these are sometimes not perfect, so you need to review them, but I found them to be significantly better for the use case outlined here.
Conclusion and next steps
In this short post I showed you one way you can use the new feature of Amazon Q Developer Workspace, to help you update your Apache Airflow DAGs, removing your upgrade roadblocks and helping you to use the latest features that Airflow brings.
Read more about this new feature in the blog post, AWS announces workspace context awareness for Amazon Q Developer chat and keep up to date with all the new features and improvements of Amazon Q Developer by checking out the changelog
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments