Phoenix and Hive Slurm Migration Complete

Friday – March 10, 2023

 

Slurm Migration Complete

Starting in August 2022, PACE migrated two of our research clusters to the Slurm scheduler, an open-source resource manager with scheduling logic integrated into it. The Hive cluster was the first cluster at PACE migrated to the Slurm scheduler in August to September 2022. The Phoenix cluster was the second cluster at PACE migrated to the Slurm scheduler in October 2022 to February 2023.

PACE has worked closely with the Hive PIs and the PACE Advisory Committee (PAC) for Hive and Phoenix, respectively, on corresponding plans for migration to ensure minimum interruption to research across both clusters. Researchers should be aware of the new software requirements and scheduler workflows with Slurm.

 

Hardware on Slurm

In total, 483 nodes on Hive and 1382 nodes on Phoenix – 1865 total nodes – have been migrated or added to the new Slurm scheduler. While most of our nodes across both Phoenix and Hive (1804) utilize a base configuration of dual Intel Xeon Gold 6226 CPUs @ 2.7 GHz (24 cores/node), DDR4-2933 MHz DRAM, and Infiniband 100HDR interconnect, nodes with upgraded hardware using updated configurations have also been added. 

New nodes added to the Phoenix cluster with Slurm scheduler from November 2022 to February 2023 include the following: 

  • 40 32-core Intel CPU High-Memory Nodes (add to 68 for 108 cpu-large nodes total) 

  • 8 128-core AMD CPU Nodes (8 cpu-amd nodes total) 

  • 12 64-core AMD CPU Nodes with 2 x Nvidia A100 GPUs (12 gpu-a100 nodes total) 

More details on the hardware on each cluster can be found here:

Phoenix Resources

 

Revised Software Stack on Slurm

In moving to the Slurm scheduler, the entire software stack needed to be recompiled to ensure compatibility with Slurm, MPI, and other libraries. Updated versions of the software stack for Hive and Phoenix can be found here. Requests to install new software on Hive and Phoenix can be submitted here. As with our software stack, researchers that have installed or are installing their own software locally will need to compile/recompile applications to ensure compatiblity with Slurm, MPI and other libraries. 

We highly recommend OnDemand via online portal for Hive and Phoenix to access applications that require a graphical user interface, including Jupyter notebooks and Interactive Desktop (for VNC sessions). Command line interfaces pace-jupyter-notebook and pace-vnc-job have been retired.

 

User Guide on Slurm

Finally, user workflows will need to be adapted to use Slurm commands and batch scripts. We recommend reading more about about cluster-specific changes due to the migration, including our PBS-to-Slurm-Scripts Conversion and Slurm User Guides, in our documentation.

More details on user workflow changes with Slurm on Hive: 

More details on user workflow changes with Slurm on Phoenix: