Skip to main content

Introduction to workflows

Work in progress

Running workflows on the DSRI is a work in progress. It usually requires some knowledge about how to orchestrate containers.

Introduction

Multiple technologies are available to run workflows on OpenShift/Kubernetes clusters. Each has its strengths and weaknesses in different areas.

Use-case dependant

The technology to use needs to be chosen depending on your use-case.

Current solutions on the DSRI

Those solutions can easily be deployed on the DSRI. Let

GitHub Actions workflows

GitHub Actions allows you to define automatically containerized workflows through a simple YAML file hosted in your GitHub repository.

See the page about GitHub Actions runners for more details, and to deploy runners on the DSRI.

Apache Airflow

Airflow is a platform to programmatically author, schedule and monitor workflows, aka. DAGs (directed acyclic graphs).

See the page about Airflow for more details, and to deploy Airflow on the DSRI.

Argo

Argo is a container native workflow engine for Kubernetes supporting both DAG and step based workflows.

  • Workflows easy to define using Kubernetes-like YAML files.
  • Easy to define if your workflow is composed of Docker containers to run with arguments.
Contact us

Contact us if you want to run Argo workflow on the DSRI

More options

Let us know if you are interested in deploying, and using, any of those workflows on the DSRI.

Kubeflow

Optimized for Tensorflow workflows on Kubernetes.

Pipelines written in Python.

Apache Airflow

Define, schedule and run workflows.

Can be deployed with OpenDataHub, see also this deployment for OpenShift.

See also: Airflow on Kubernetes blog, and Kubernetes in Airflow documentation.

Volcano

Run batch pipelines on Kubernetes with Volcano.

  • More a scheduler than a workflow engine.

  • Volcano can be used to run Spark, Kubeflow or KubeGene workflows.

Nextflow

Nextflow has been developed by the genomic research scientific community and is built to run bioinformatics pipeline.

Define your workflow in a Bash script fashion, providing input, output and the command to run. Without the need to create and use Docker container for Conda pipelines.

CWL

  • Developed by the genomic research scientific community.
  • Good support for provenance description (export as RDF).
  • Support on OpenShift still in development
  • Propose a GUI to build the workflows: Rabix Composer

KubeGene

KubeGene is a turn-key genome sequencing workflow management framework.

See the Workflow example, and how to define a tool.

Seldon

Open-source platform for rapidly deploying machine learning models on Kubernetes. Manage, serve and scale models built in any framework on Kubernetes.

Contact us

Feel free to contact us if you have any questions about running workflows on DSRI or to request the support of a new technology.