Dear PACE researchers,
Our scheduled maintenance has completed ahead of schedule! All PACE clusters, including Phoenix, Hive, Firebird, PACE-ICE, COC-ICE, and Buzzard, are ready for research. As usual, we have released all users jobs that were held by the scheduler. We appreciate everyone’s patience as we worked through these maintenance activities.
Our next maintenance period is tentatively scheduled to begin at 6:00AM on Wednesday, February 9, 2022, and conclude by 11:59PM on Friday, February 11, 2022. We have also tentatively scheduled the remaining maintenance periods for 2022 for May 11-13, August 10-12, and November 2-4.
The following tasks were part of this maintenance period:
ITEMS REQUIRING USER ACTION:
- [Complete] TensorFlow upgrade due to security vulnerability. PACE will retire older versions of TensorFlow, and researchers should shift to using the new module. We also request that you replace any self-installed TensorFlow packages. Additional details are available on our blog.
ITEMS NOT REQUIRING USER ACTION:
- [Complete][Datacenter] Databank will clean the water cooling tower, requiring that all PACE compute nodes be powered off.
- [Complete][System] Operating system patch installs
- [Complete][Storage/Phoenix] Lustre controller firmware and other upgrades
- [Complete][Storage/Phoenix] Lustre scratch upgrade and expansion
- [Postponed][Storage] Hive GPFS storage upgrade
- [Complete][System] System configuration management updates
- [Complete][System] Updates to NVIDIA drivers and libraries
- [Complete][System] Upgrade some PACE infrastructure nodes to RHEL 7.9
- [Complete][System] Reorder group file
- [Complete][Headnode/ICE] Configure c-group controls on COC-ICE and PACE-ICE headnodes
- [Complete][Scheduler/Hive] separate Torque & Moab servers to improve scheduler reliability
- [Complete][Network] update ethernet switch firmware
- [Complete][Network] update IP addresses of switches in BCDC
If you have any questions or concerns, please contact us at pace-support@oit.gatech.edu. You may read this message and prior updates related to this maintenance period on our blog.
Best,
-The PACE Team