OIT’s Scheduled Network Maintenance

[Update – January 5, 2020 11:30am]

Dear PACE users,

The routers that were upgraded late last night had a problem with OSPF, which caused the missing routes, and prevented connection to the system.  Users who may have tried to connect to PACE resources late last night would have received errors such as “no route to host” when attempting to ssh to headnodes.   Network Engineering has downgraded the firmware to the original version, and connectivity has been restored during the scheduled maintenance window.   PACE completed the testing by 2:19am this morning and confirmed that PACE services are operational.

Network Engineering team has engaged the vendor to identify the root cause of the issue given the firmware has been tested on same exact hardware prior to the deployment last night without any issues.   Once the root cause is identified and resolved, another upgrade will be scheduled and communicated accordingly.

Thank you for your attention to this matter, and if you have any questions, please direct them to pace-support@oit.gatech.edu.

Best,
The PACE Team

 

[Original Post – January 4, 2020 3:49pm]

Dear PACE users,

OIT’s Network Engineering Team will be conducting maintenance activities starting this evening, 01/04/2021, at 7:00pm through 2:00am (01/05/2021).   Data center routers and firewalls will get firmware upgrades.  All devices have redundancy, and devices will be upgraded one at a time.  No service disruptions are expected.   However, it is possible that connections in and out of PACE (e.g., interactive sessions, file transfers) may be interrupted during that period of time.

Who is impacted: During the maintenance window, we do not expect service disruptions at PACE; however,  there is a possibility that PACE users may not be able to connect to PACE resources and/or they may lose connection briefly.  We encourage users to avoid running interactive jobs (e.g., VNC/X11) that rely on an active SSH connection to a PACE cluster during this time frame to avoid sudden interruptions due to a loss of connection to the PACE resources. Batch jobs that are running and queued in the PACE schedulers will operate normally; however, any jobs that require resources outside of PACE or Internet will be subject to interruptions during this maintenance activity.  This maintenance activity will not affect any of the PACE storage systems.

What PACE will do:  PACE will remain on standby during this maintenance activities to monitor the systems, conduct testing and report on any interruptions in service.

Thank you for your attention to this matter, and if you have any questions, please direct them to pace-support@oit.gatech.edu.

Best,
The PACE Team