Online / 5 & 6 February 2022

visit

Using OpenStack to reduce HPC service complexity

... no, that is not an oxymoron!


Why build #4 on the Green500 using OpenStack? It makes it easier to manage. Cambridge University started using OpenStack in 2015. Since mid 2020, all new hardware is controlled using OpenStack. Compute nodes, GPU nodes, Lustre nodes, Ceph nodes, almost everything. OpenStack allows large baremetal slurm clusters and dedicated TRE (trusted research environments) to share the same images. Is this a cloud native supercomputer?

We will explore how OpenStack is used to manage a supercomputer as a shared pool of hardware resources, that can be partitioned between a multitude of different platforms required by a diverse group of scientists. Ranging from Trusted Research Environments (TREs), on demand dedicated AI platforms, dedicated big data platforms, and to traditional shared Slurm clusters.

We will focus on providing a range of services from a single shared hardware pool, allowing for the delivery of both on demand interactive compute platforms for STFC's IRIS e-Infrastrcture and Slurm clusters such as the #4 in the Green500, called Wilkes-3: https://www.top500.org/system/179930/

This makes use of both OpenStack Ironic, for the baremetal deployment, and on-demand OpenStack KVM powered VMs running Cluster API provisioned Kubernetes, with KubeApps to deploy JuypterHub.

Speakers

Photo of John Garbutt John Garbutt

Attachments

Links