Maintenance on the Phoenix, Hive, ICE, and Firebird clusters is complete. Some maintenance work is ongoing for the OSG Buzzard cluster, but jobs are running. The physical datacenter work to allow for installation of a 2nd cooling pump in the research hall was successfully completed, and we expect the new pump to be brought online in October during our next Maintenance Period, which will be October 6-8th, 2025.
The Phoenix, Hive, ICE, and Firebird clusters are back in production and ready for research and instruction; all jobs that were held by the scheduler have been released, Globus and Open OnDemand services have resumed, and access to login nodes is restored.
Potential Issues
- The TensorFlow 2.16 module is now incompatible with the up-to-date CUDA drivers on GPU nodes. An updated TensorFlow 2.17 module is targeted for release next week. We have not observed issues with other CUDA-dependent modules such as PyTorch and CUDA C/C++ apps.
- Phoenix users of the Ansys Fluent GUI should use the dedicated Ansys Workbench application in Open OnDemand, which was introduced to increase the stability and usability of Ansys products on Phoenix. Ansys Fluent 2025R1 in Interactive Desktop may produce MPI errors while 2024R2 works as expected. Hive users must continue using the Interactive Desktop to run Ansys Fluent.
- The OSG Buzzard cluster is expected to resume full functionality midway through next week, though jobs scheduled through the OSPool and project-specific pools are being accepted and run.
Thank you and happy computing!
The PACE Team