ANSYS version 15 and Matlab R2014a installed

ANSYS version 15 and Matlab version R2014a have been installed on PACE clusters.
To see examples of how to properly load and use the new versions, execute the following commands and follow the instructions provided.

$ module help ansys/15.0

$ module help matlab/r2014a

If you have any problems executing the examples given by “module help”, please contact pace-support@oit.gatech.edu

Mvapich2 2.0rc1 available in PACE repository

We have installed the most recent Mvapich2 stack (2.0rc1), which is available via module “mvapich2/2.0rc1”. Please see this changelog if you would like to know more about the improvements this version provides.

Also, please note that we have not started rebuilding any applications with this stack yet. If you think it will provide significant benefits for any existing application, please send us an email to pace-support@oit.gatech.edu and we will be happy to recompile that application for you.

Another quick note is that versions mvapich1.6 to mvapich1.8 are known to have performance problems, which are fixed with 1.8 (hint: search for “Georgia Institute of Technology” in the changelog).  We are still keeping them in the repository for backwards compatibility, but please refrain from using these old versions as you can.

Happy computing!

Login Node Storage Server Problems

Last night (2013/06/30), one of the storage servers that is responsible for many of the cluster login nodes encountered some major problems.
These issues are preventing the login nodes from allowing any user to login or use the server.
Following is a list of the affected login nodes:
cee
chemprot
cns
cygnus-6
force-6
force
math
mokeys
optimus
testflight-6

We are aware of the problem and we are working as quickly as possible to fix this.
Please let us know of any problems you are having that may be related to this.
We will keep you posted about our progress.

Intel Cluster Studio 2013 XE Installed

The Intel Cluster Studio 2013 XE software suite installation adds several new and useful tools for PACE users.

  • VTune: Intel® VTune™ Amplifier XE 2013 is a serial and parallel performance profiler for C, C++, C#, Fortran, Assembly and Java.
  • Inspector: Intel® Inspector XE is an easy to use memory debugger and thread debugger for serial and parallel applications.
  • Advisor: Intel® Advisor XE is a threading prototyping tool for C, C++, C# and Fortran.

This installation includes updated versions of many currently installed packages. The updates include:

  • MKL – updated to 11.0.1
  • TBB – updated to 4.1
  • IPP – updated to 7.1.1
  • Compilers (C, C++, Fortran) – updated to 13.2.146

To use the new or updated software, please load whichever modules are appropriate:

  • intel/13.2.146 (loads the C, C++, and Fortran compilers)
  • vtune/2013xe (loads VTune)
  • advisor/2013xe (loads Advisor)
  • inspector/2013xe (loads Inspector)
  • tbb/4.1 (loads the Thread Building Blocks)
  • ipp/7.1.1 (loads the Performance Primitives)
  • mkl/11.0.1 (load the Math Kernel Library)

For information on using VTune, Inspector, Advisor, or any of the Intel tools, see the Intel Cluster Studio XE site.

New 128-procs Allinea DDT license on PACE clusters

Allinea DDT is a powerful parallel debugger with an easy-to-use GUI. You can run it by loading its module (module load ddt/3.2) and entering “ddt”. Some introduction level information can be found in “https://pace.gatech.edu/workshop/DebuggingProfiling.pdf“.

We extended our single-user 32-procs license to multi-user 128 procs. Aside from the increased number of processors, this license allows multiple users to use the software at the same time, as long as the total number of processors do not exceed 128. E.g., two users can use the software with 64procs run each.

Happy debugging!

 

 

PACE Systems Back Online

The fileserver has recovered, and all headnodes are now accessible. The jobs running off scratch should continue from where they left. You have access to all files, including the scratch. The server is still performing reconstruction of data, which may slow down the system (especially on volumes v0 and v3) for a few more hours. This slowness will go away when the reconstruction is complete.

We are expecting to receive the failed part tomorrow (6/6). The fileserver can function without this part and its installation will not cause any interruptions.

Once again, thank you for bearing with us while we were working on this problem. If you have jobs that you think crashed due to this problem, please send us an email at pace-support@oit.gatech.edu.

Login Problems, current situation

The Panasas fileserver (scratch storage) crashed today while recovering from a hardware problem. This causes the headnodes (that mount Panasas) to hang, and they are not accessible via SSH now.

We do have a way to disable Panasas and give you access to headnodes right away, without the panasas storage. However, doing so will crash all of the jobs using the scratch space. We do not want that, especially considering that some jobs have been running for days.

We are now running a filesystem check on the system, which will take 3 to 4 hours. This is required to prevent data corruption. After this process, Panasas should recover and the jobs will continue running. At the point, the headnodes will become accessible again.

If you urgently need to access your data in your home or project directories, please contact us at pace-support@oit.gatech.edu. We might be able to help you access your files via a headnode that does not mount Panasas.

The filesystem check has been running for 40 minutes and current at 26% (by 12:25pm EST).

Thank you once again for your understanding and patience, and we apologize for this inconvenience,

Login Problems

With the exception of RHEL-5 Atlas users, it is currently not possible for regular users to log into PACE, due to a problem with the PANFS storage system. We are working to get the problem resolved as quickly as possible.