The Data Science Research Infrastructure is a cluster of servers to deploy workspaces and applications for Data Science.
It works by starting workspaces and applications in Docker containers that are automatically deployed to a powerful server on the cluster using Kubernetes, a container orchestration system. You can then access your workspace or application through an URL automatically generated.
✅ What can be done on the DSRI
The DSRI is particularly useful if you need to:
- Gain access to more computing resources (memory and CPUs), which enables you to load larger amount of data, or use more threads for parallelized tasks
- Run jobs that takes a long time to complete
- Deploy any database or service you need, and connect to it from your workspace easily
- Book and start a workspace that uses one of our GPUs
The DSRI proposes a number of popular workspaces to work with data:
- Multiple flavors of JupyterLab (scipy, tensorflow, all-spark, and more)
- VisualStudio Code server (also available within the JupyterLab workspaces)
- RStudio, with a complementary Shiny server
- Ubuntu Desktop
You can then install anything you want in your workspace using
DSRI is a computing infrastructure, built and used to run data science workloads. DSRI stores data in a persistent manner, but all data stored on the DSRI is susceptible to be altered by the workloads you are running, and we cannot guarantee its immutability.
Always keep a safe copy of your data outside the DSRI. And don't rely on the DSRI for long term storage.
❌ What cannot be done
- Since DSRI can only be accessed when using the UM VPN, deployed services will not be available on the public Internet 🔒
- All activities must be legal in basis. You must closely examine and abide by the terms and conditions of any data, software, or web service that you use as part of your work 📜
- You cannot reach data or servers hosted at Maastricht University from the DSRI by default. You will need to request access in advance here 📬️
- Right now it is not possible to reach the central UM fileservices (MFS)
If you are working at Maastricht University, see this page to request an account, and run your services on the DSRI.
The DSRI architecture
Here is a diagram providing a simplified explanation of how the DSRI works, using popular data science applications as examples (JupyterLab, RStudio, VSCode server)
The DSRI specifications
We use OKD 4.11, the Origin Community Distribution of Kubernetes that powers RedHat OpenShift, a distribution of the Kubernetes container orchestration tool. Kubernetes takes care of deploying the Docker containers on the cluster of servers, the OKD distribution extends it to improve security, and provide a user-friendly web UI to manage your applications.
We use RedHat Ceph storage for the distributed storage.
- 16 CPU nodes
|RAM (GB)||CPU (cores)||Storage (TB)|
|Node capacity||512 GB||64 cores (128 threads)||120 TB|
|Total capacity||8 192 GB||1 024 cores||1 920 TB|
- 1 GPU node: Nvidia DGX1 8x Tesla V100 - 32GB GPU
|GPUs||RAM (GB)||CPU (cores)|
|GPU node capacity||8||512 GB||40 cores|
Learn more about DSRI
See the following presentation about the Data Science Research Infrastructure