Deploy Airflow
Deploy Apache Airflow to run workflows (aka. DAGs), hosted in a Git repository, on the DSRI.
Install the chart
You will need to have Helm installed on your computer to deploy a Helm chart, see the Helm docs for more details.
Install the Helm chart to be able to deploy Airflow on the DSRI:
helm repo add apache-airflow https://airflow.apache.org
helm repo update
Deploy Airflow
You can quickly deploy Airflow on the DSRI, with DAGs automatically synchronized with your Git repository.
We use a values.yml
file with all default parameters pre-defined for the DSRI, so you just need to edit the password and git repository configuration in this command, and run it:
helm install airflow apache-airflow/airflow \
-f https://raw.githubusercontent.com/MaastrichtU-IDS/dsri-documentation/master/applications/airflow/values.yml \
--set webserver.defaultUser.password=yourpassword \
--set dags.gitSync.repo=https://github.com/bio2kg/bio2kg-etl.git \
--set dags.gitSync.branch=main \
--set dags.gitSync.subPath=workflows/dags
If you need to do more configuration you can download the a values.yml
file, edit it directly to your settings and use this file locally with -f values.yml
A few seconds after Airflow started to install, you will need to fix the postgresql deployment in a different terminal window (unfortunately setting the serviceAccount.name
of the sub chart postgresql
don't work, even if it should be possible according to the official helm docs). Run this command to fix postgresql:
oc patch statefulset/airflow-postgresql --patch '{"spec":{"template":{"spec": {"serviceAccountName": "anyuid"}}}}'
Once Airflow finished to deploy, you can access its web interface temporarily by forwarding the webserver on your machine at http://localhost:8080
oc port-forward svc/airflow-webserver 8080:8080
Or permanently expose the interface on a URL accessible when logged to the UM VPN, with HTTPS enabled:
oc expose svc/airflow-webserver
oc patch route/airflow-webserver --patch '{"spec":{"tls": {"termination": "edge", "insecureEdgeTerminationPolicy": "Redirect"}}}'
Finally, get the route to the Airflow web interface, or access it via the DSRI web UI:
oc get routes
Example workflows
You can find example DAGs for bash operator, python operator and Kubernetes pod operator here.
Here an example of a DAG using the Kubernetes pod operator to run tasks as pods, you will need to change the namespace
parameter to your DSRI project where Airflow is deployed:
from airflow import DAG
from datetime import datetime, timedelta
from airflow.contrib.operators.kubernetes_pod_operator import KubernetesPodOperator
from airflow.operators.dummy_operator import DummyOperator
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime.utcnow(),
'email': ['airflow@example.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5)
}
dag = DAG(
'kubernetes_pod_operator',
default_args=default_args,
schedule_interval=None
# schedule_interval=timedelta(minutes=10)
)
start = DummyOperator(task_id='run_this_first', dag=dag)
passing = KubernetesPodOperator(
namespace='CHANGEME',
image="python:3.6",
cmds=["python","-c"],
arguments=["print('hello world')"],
labels={"app": "airflow"},
name="passing-test",
task_id="passing-task",
get_logs=True,
dag=dag
)
passing.set_upstream(start)
Delete the chart
helm uninstall airflow
See also
Here are a few links for more details on the official Airflow Helm chart:
Other ways to deploy Airflow on OpenShift: