Our May 2019 maintenance (https://blog.pace.gatech.edu/?p=6473) is complete one day ahead of schedule! We have brought compute nodes online and released previously submitted jobs. Login nodes are accessible and your data are available. We are postponing the replacement of CMOS batteries on the servers due to scheduling conflict with the vendor. As usual, there are a small number straggling nodes we will address over the coming days.
Compute
- (Complete) Upgrade testflightcluster to RHEL 7.6
- (Complete) Upgrade gemini-gpuand gemini-cpu clusters to RHEL7, which will require user action (only for gemini-cpu/gpu clusters‘ users)
- (Complete) Switch nodes between chemxand gemini-cpu queues
- (Postponed) Replace CMOS batteries on multiple servers
Network
- (Complete) Replace a faulty InfiniBand switch, which affects a single rack with no impact to the complete fabric
- (Complete) Migrate Rich to campus connections to 10Gbps
Storage
- (Complete) Reboot ICE storage servers to correct issues with backup application
- (Complete) Perform detailed performance analysis of the GPFS environment, in order to fine tune parameters to improve performance
Other
- (Postponed) Updates to the submit filters in the schedulers
- (Complete) Update salt master and minions
If you have any questions or concerns, please contact pace-support@oit.gatech.edu