Hive scheduler recurring outages

[Update 11/5/21 3:15 PM]

During the November maintenance period, PACE separated Torque and Moab, the two components of the Hive scheduler. This two-server setup, mirroring the Phoenix scheduler arrangement, should improve stability of the Hive scheduler under heavy utilization. We will continue to monitor the Hive scheduler. If you have any questions or concerns, please direct them to pace-support@oit.gatech.edu.

[Update 10/15/21 5:15 PM]

The Hive scheduler is functioning at this time. The PACE team disabled several system utilities that may have contributed to earlier issues with the scheduler. We will continue to monitor the scheduler status and to work with our support vendor to improve stability of Hive’s scheduler. Please check this blog post for updates.

[Update 10/15/21 4:15 PM]

The Hive scheduler is again functional. The PACE team and our vendor are continuing our investigation in order to restore stability to the scheduler.

[Original Post 10/15/21 12:35 PM]

Summary: Hive scheduler recurring outages

What’s happening and what are we doing: The Hive scheduler has been experiencing intermittent outages over the past few weeks requiring frequent restarts. At this time, the PACE team is running a diagnostic utility and will restart the scheduler shortly. The PACE team is actively investigating the outages in coordination with our scheduler vendor to restore stability to Hive’s scheduler.

How does this impact me: Hive researchers may be unable to submit or check the status of jobs, and jobs may be unable to start. You may find that the “qsub” and “qstat” commands and/or the “showq” command are not responsive. Already-running jobs will continue.

What we will continue to do: PACE will continue working to restore functionality to the Hive scheduler and coordinating with our support vendor. We will provide updates on our blog, so please check here for current status.

Please accept our sincere apology for any inconvenience that this temporary limitation may cause you. If you have any questions or concerns, please direct them to pace-support@oit.gatech.edu.