Dask Cluster
Dask is a parallel computing library for Python that scales analytics workflows. On the DSRI, a Dask cluster can be deployed using Helm and includes a Dask scheduler, workers, and a JupyterLab interface to interact with the cluster.
Deploy
Dask is deployed via Helm chart from the DSRI web UI.
- In Developer mode, go to +Add and select Helm Chart.
- Search for Dask, select the Dask chart, and click Create.
- In the YAML configuration, find the
jupytersection and set the following:
jupyter:
command: ["jupyter", "lab", "--allow-root", "--ip=0.0.0.0", "--port=8888", "--no-browser"]
servicePort: 8888
- Click Create to deploy the cluster.
Persistent storage
After the cluster is running, add persistent storage to the dask-jupyter pod:
- In the Topology view, click on the
dask-jupyterpod. - Go to Actions > Add Storage and create a new Persistent Volume Claim for the cluster.
Data stored on this volume will survive pod restarts.
Access the cluster
Create a route
- Switch to the Administrator view and go to Networking > Routes.
- Click Create Route, fill in a name, select the
dask-jupyterservice, set the target port, and click Create. - Use the generated URL to open the JupyterLab interface.
Get the JupyterLab token
The JupyterLab interface is protected by a token. To find it:
- Run
oc get podsto find the full pod name ofdask-jupyter. - Run
oc logs <pod-name>and copy the token from the output.
Use this token to log in to JupyterLab.