PACE Ready for Research

Our May 2019 maintenance (https://blog.pace.gatech.edu/?p=6473) is complete one day ahead of schedule! We have brought compute nodes online and released previously submitted jobs.  Login nodes are accessible and your data are available.  We are postponing the replacement of CMOS batteries on the servers due to scheduling conflict with the vendor.  As usual, there are a small number straggling nodes we will address over the coming days.

Compute

  • (Complete) Upgrade testflightcluster to RHEL 7.6
  • (Complete) Upgrade gemini-gpuand gemini-cpu clusters to RHEL7, which will require user action (only for gemini-cpu/gpu clusters‘ users)
  • (Complete) Switch nodes between chemxand gemini-cpu queues
  • (Postponed) Replace CMOS batteries on multiple servers

Network

  • (Complete) Replace a faulty InfiniBand switch, which affects a single rack with no impact to the complete fabric
  • (Complete) Migrate Rich to campus connections to 10Gbps

Storage

  • (Complete) Reboot ICE storage servers to correct issues with backup application
  • (Complete)  Perform detailed performance analysis of the GPFS environment, in order to fine tune parameters to improve performance

Other

  • (Postponed) Updates to the submit filters in the schedulers
  • (Complete) Update salt master and minions

 

If you have any questions or concerns, please contact pace-support@oit.gatech.edu