TSRB Connectivity Restored

Network access to the RHEL-5 Joe cluster compute nodes has been restored.

The problem was caused by a UPS power disruption to a network switch in the building. In addition to recovering the switch and UPS, the backbone team added power redundancy to the switch by adding another PDU to the switch and connecting it to a different UPS.

New Software: VASP 5.3.2

VASP 5.3.2 – Normal, Gamma, and Non-Collinear versions

Version 5.3.2 of VASP has been installed.
The newly installed versions have been checked against our existing tests; the expected results agree to within some small error.
Please check this new version against your known correct results!

Using it

#First, load the required compiler 
$ module load intel/12.1.4
#Load all the necessary support modules
$ module load mvapich2/1.6 mkl/10.3 fftw/3.3
#Load the vasp module
$ module load vasp/5.3.2
#Run vasp $ mpirun vasp #Run the gamma-only version of vasp $ mpirun vasp_gamma #Run the noncollinear version of vasp $ mpirun vasp_noncollinear

Compilation Notes

  • Only the Intel compiler generated MPI-enabled vasp binaries that correctly executed the test suite.
  • The “vasp” binary was compiled with these preprocessor flags: -DMPI -DHOST=\"LinuxIFC\" -DIFC -DCACHE_SIZE=12000 -DMINLOOP=1 -DPGF90 -Davoidalloc -DNGZhalf -DMPI_BLOCK=8000
  • The “vasp_gamma” binary was compiled with these preprocessor flags: -DMPI -DHOST=\"LinuxIFC\" -DIFC -DCACHE_SIZE=12000 -DMINLOOP=1 -DPGF90 -Davoidalloc -DNGZhalf -DwNGZhalf -DMPI_BLOCK=8000
  • The “vasp_noncollinear” binary was compiled with these preprocessor flags: -DMPI -DHOST=\"LinuxIFC\" -DIFC -DCACHE_SIZE=12000 -DMINLOOP=1 -DPGF90 -Davoidalloc -DMPI_BLOCK=8000

TSRB Connectivity Problem

All of the RHEL-5 Joe nodes are currently unavailable, due to an unspecified connectivity problem at TSRB. This problem does not impact any joe-6 nodes, or nodes from any other group.

Since connectivity between Joe and the rest of PACE is required for home, project, and scratch storage access, all of the jobs currently running on Joe will eventually get stuck in a IO-wait state, but should resume once connectivity has been restored.

Cluster Downtime December 19th for Scratch Space Issues

As many of you have noticed, we have experienced disruptions and undesirable performance with our high-speed scratch space. We are continuing to work diligently with Panasas to discover the root cause and repair for these faults.

As we are working toward a final resolution of the product issues, we will need to schedule an additional cluster-wide downtime on the Panasas to implement a potential resolution. We are scheduling a short downtime (2 hours) for Wednesday, December 19th at 2pm ET. During this window, we expect to install a tested release of software.

We understand this is an inconvenience to all our users but feel this is important enough to the PACE community to warrant this disruption. If this particular date and duration falls at a time that is especially difficult, please contact us and let us know and we will do our best to negotiate a better date or time.

It is our hope this will implement a permanent solution to these near-daily disruptions.

– Paul Manno

New and Updated Software: BLAST, COMSOL, Mathematica, VASP

All of the software detailed below is available through the “modules” system installed on all PACE-managed Redhat Enterprise 6 computers.
For basic usage instructions on PACE systems see the Using Software Modules page.

NCBI BLAST 2.2.25 – Added multithreading in new GCC 4.6.2 version

The 2.2.25 version of BLAST that was compiled with GCC 4.4.5 has multithreading (i.e. multi-CPU execution) disabled.
A new version of BLAST with multithreading enabled has been compiled with the GCC 4.6.2 compiler.

Using it

#First, load the required compiler 
$ module load gcc/4.6.2
#Now load BLAST
$ module load ncbi_blast/2.2.25
#Setup the environment so that blast can find the database
$ export BLASTDB=/path/to/db
#Run a nucleotide-nucleotide search
$ blastn -query /path/to/query/file -db <db_name> -num_threads <number of CPUS allocated to job>

COMSOL 4.3a – Student and Research versions

COMSOL Multiphysics version 4.3a contains many new functions and additions to the COMSOL product suite. These Release Notes provide information regarding new functionality in existing products and an overview of new products.
See the COMSOL Release Notes for information on updates to this version of COMSOL.

Using it

#Load the research version of comsol 
$ module load comsol/4.3a-research
$ comsol ...
#Use the matlab livelink
$ module load matlab/r2011b
$ comsol -mlroot ${MATLAB}

Mathematica 9.0

Mathematica 9 is a major update to the Mathematica software.

Using it

$ module load mathematica/9.0 
$ mathematica

VASP 5.2.12

The pre-calculated kernel for the vdW-DF functional has been installed into the same directory as the vasp binary.
This precalculated kernel is contained in the file “vdw_kernel.bindat”

Using it

#First, load the vasp module (and all the prerequisites) 
$ module load intel/12.1.4 mvapich2/1.6 mkl/10.2 fftw/3.3 vasp/5.2.12
#Copy the kernel to where vasp expects (normally the working directory)
$ cp ${VDW_KERNEL} .
# Run vasp
$ mpirun vasp

Profiling tools available: PAPI and TAU

The Performance API (PAPI) and TAU are two of the most common open source profiling tools, and they are now available for PACE users, including support for hardware counters and threading.

PAPI description, from their website:

The PAPI project specifies a standard application programming interface (API) for accessing hardware performance counters available on most modern microprocessors. PAPI provides two interfaces to the underlying counter hardware; a simple, high level interface for the acquisition of simple measurements and a fully programmable, low level interface directed towards users with more sophisticated needs.

TAU description, from their website:

TAU Performance System is a portable profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C, C++, Java, Python. This tool is capable of gathering performance information through instrumentation of functions, methods, basic blocks, and statements.

 

TAU tool uses PAPI for event collection and provides two tools for visualization. The text based tool is called pprof and the graphical tool is called paraprof.

A *very* short guide to using TAU on PACE clusters

* First, you need to recompile your code with TAU wrappers.

  • Load the modules your code needs (compiler, MPI, etc)
module load gcc/4.4.5 mvapich2/1.6
  • Load the latest tau module (currently tau/2.22-p1, older versions are known to have bugs)
module load tau/2.22-p1

(This will load PDT and PAPI modules too, if you don’t have them loaded already)

  • The TAU module will set the correct TAU Makefile in your environment. Check if you have it right:
$ echo $TAU_MAKEFILE
/usr/local/packages/tau/2.22-p1/mvapich2-1.6/gcc-4.4.5/x86_64/lib/Makefile.tau-papi-mpi-pthread-pdt-openmp

• Compile your code using one of the compiler wrapper scripts.

E.g., for a f90 code:

tau_f90.sh -L${PAPIDIR}/lib -lpfm loop_test.f90 -o loop_test

Note that “-L${PAPIDIR}/lib -lpfm” part is necessary on PACE clusters to avoid the system default libpfm, which is not compatible with TAU. If you don’t specify this, you will get this warning:

Error: Reverting to a Regular Make
To suppress this message and revert automatically, please add -optRevert to your TAU_OPTIONS environment variable
Press Enter to continue

* Run the code as usual (not on the headnode!!) 

 mpirun -np 4 ./loop_test

 You will see profiler files in the format “profile.A.B.C” in the same folder, which indicates TAU ran and collected profiling data

* Finally, run pprof or paraprof from the same directory to see the results!

    • pprof -ea   (sort by exclusive time and show all details)
    • paraprof

Remember, these are very brief instructions. Please refer to PAPI and TAU documentation for more details:

PAPI Reference

TAU User Guide

Enjoy!

 

 

 

 

 

 

New and Updated Software: Java, MUMPS, SCOTCH, ParMETIS, OpenFOAM, trf, CUDA, lagan, MPJ Express, R, Wireshark, Sharktools

We have lots of updated software this time.
I’ve been putting off an update for other reasons, and now we have a lot to cover.
Remember that all of this software is available through the “modules” system installed on all PACE-managed Redhat Enterprise 6 computers.
For basic usage instructions on PACE systems see the Using Software Modules page.

Java 7

Here is a brief summary of the enhancements included with the Java 7 release:

  • Improved performance, stability and security.
  • Enhancements in the Java Plug-in for Rich Internet Applications development and deployment.
  • Java Programming language enhancements that enable developers with ease of writing and optimizing the Java code.
  • Enhancements in the Java Virtual machine to support Non-Java languages.

There are a large number of enhancements in JDK 7.
See the JDK 7 website for more information.

Using it

$ module avail java 
java/1.7.0

$ module load java/1.7.0
#Checking that you are using the right version
$ which java
/usr/local/packages/java/1.7.0/bin/java
$ which javac
/usr/local/packages/java/1.7.0/bin/javac

Note: The java/1.7.0 module adds “.” to the CLASSPATH environment variable.
If you don’t know what that means, see the wikipedia page.

Scotch and PT-Scotch 5.1.12

Scotch is a software package and set of libraries for sequential and parallel graph partitioning, static mapping and clustering, sequential mesh and hypergraph partitioning, and sequential and parallel sparse matrix block ordering.

Using it

#First load a compiler - almost any compiler will work: 
$ module load gcc/4.6.2
#Load an MPI distribution - any of them should work:
$ module load openmpi/1.4.3
#Compile an application using the ptscotch library:
$ mpicc mpi_application.c ${LDFLAGS} -lptscotch

ParMETIS 3.2.0 and 4.0.2

ParMETIS is an MPI-based parallel library that implements a variety of algorithms for partitioning unstructured graphs, meshes, and for computing fill-reducing orderings of sparse matrices. ParMETIS extends the functionality provided by METIS and includes routines that are especially suited for parallel AMR computations and large scale numerical simulations. The algorithms implemented in ParMETIS are based on the parallel multilevel k-way graph-partitioning, adaptive repartitioning, and parallel multi-constrained partitioning schemes developed in our lab.

ParMETIS provides the following five major functions:

  • Graph Partitioning
  • Mesh Partitioning
  • Graph Repartitioning
  • Partitioning Refinement
  • Matrix Reordering

Using it

#First load a compiler - almost any compiler will work: 
$ module load intel/12.1.4
#Load an MPI distribution - any of them should work:
$ module load mvapich2/1.6
#Compile an application using the parmetis library:
$ mpicc mpi_application.c ${LDFLAGS} -lparmetis -lmetis

MUMPS 4.10.0

MUMPS is a (MU)ltifrontal (M)assively (P)arallel sparse direct (S)olver.
Main Features:

  • Solution of large linear systems with symmetric positive definite matrices; general symmetric matrices; general unsymmetric matrices;
  • Version for complex arithmetic;
  • Parallel factorization and solve phases (uniprocessor version also available);
  • Iterative refinement and backward error analysis;
  • Various matrix input formats assembled format; distributed assembled format; elemental format;
  • Partial factorization and Schur complement matrix (centralized or 2D block-cyclic);
  • Interfaces to MUMPS: Fortran, C, Matlab and Scilab;
  • Several orderings interfaced: AMD, AMF, PORD, METIS, PARMETIS, SCOTCH, PT-SCOTCH.

Using it

#First load a compiler - almost any compiler will work: 
$ module load gcc/4.6.2
#Load an MPI distribution - any of them should work:
$ module load openmpi/1.4.3
# Load the rest of the prerequisites (other solvers and libraries)
$ module load mkl/10.3 scotch/5.1.12 parmetis/3.2.0
#Compile your application and link against the correct mumps library:
$ mpicc mpi_application.c ${LDFLAGS} -lcmumps

OpenFOAM 2.1.x

OpenFOAM is a free, open source CFD software package developed by OpenCFD Ltd at ESI Group and distributed by the OpenFOAM Foundation . It has a large user base across most areas of engineering and science, from both commercial and academic organisations. OpenFOAM has an extensive range of features to solve anything from complex fluid flows involving chemical reactions, turbulence and heat transfer, to solid dynamics and electromagnetics.

Using it

#Unload any compiler and MPI modules you may have loaded: 
$ module list
pgi/12.3 openmpi/1.5.4 acml/5.2.0 #pgi/12.3 and openmpi/1.5.4 are just examples.
$ module rm openmpi/1.5.4 pgi/12.3
# Load the openfoam module
$ module load openfoam/2.1.x
ERROR: The directory ~/scratch/OpenFOAM/2.1.x must exist
OpenFOAM module not loading
execute "mkdir -p ~/scratch/OpenFOAM/2.1.x" to create this directory
#Oops - the openfoam module requires that we have a particular directory for openfoam to work with.
$ mkdir -p ~/scratch/OpenFOAM/2.1.x
#Now load the openfoam module again
$ module load openfoam/2.1.x
#Test that openfoam is OK
$ foamInstallationTest
#If this command succeeded, everything is OK.
#Testing openfoam
$ cd ~/scratch/OpenFOAM/2.1.x
$ cp -r ${FOAM_TUTORIALS}/tutorials/basic .
$ cd basic/laplacianFoam/flange/
$ ./Allclean
$ ./Allrun
ansysToFoam: converting mesh flange.ans
Running laplacianFoam on ~/scratch/OpenFOAM/2.1.x/basic/laplacianFoam/flange
Running foamToFieldview9 on ~/scratch/OpenFOAM/2.1.x/basic/laplacianFoam/flange
Running foamToEnsight on ~/scratch/OpenFOAM/2.1.x/basic/laplacianFoam/flange
Running foamToVTK on ~/scratch/OpenFOAM/2.1.x/basic/laplacianFoam/flange

trf (Tandem Repeats Finder)

A tandem repeat in DNA is two or more adjacent, approximate copies of a pattern of nucleotides. Tandem Repeats Finder is a program to locate and display tandem repeats in DNA sequences.

Using it

$ module load trf/4.07b 
$ trf

CUDA 5.0.35

CUDA is a parallel computing platform and programming model invented by NVIDIA. It enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU).

Using it

$ module load cuda/5.0.35 
#Use nvcc to compile a CUDA application
$ nvcc application.cpp

LAGAN/h2>

LAGAN toolkit is a set of tools for local, global, and multiple alignment of DNA sequences.

Using it

#Load a compiler module 
$ module load gcc/4.7.2
#Load the lagan module
$ module load lagan/2.0
$ lagan.pl

MPJ Express

MPJ Express is an open source Java message passing library that allows application developers to write and execute parallel applications for multicore processors and compute clusters/clouds.

Using it

#MPJ needs to store log files and cannot do so in the system-install location. 
#We need to create a place for it to put log data.
$ mkdir ~/mpj/logs
$ module load mpj/0.38
#Inside a job script:
$ mpjboot machinefile
$ mpjrun.sh ... application.jar
$ mpjhalt machinefile

R 2.15.2

R is a free software environment for statistical computing and graphics.

Using it

$ module load R/2.15.2 
$ R

Wireshark 1.4.15, 1.6.12, 1.8.4

Wireshark is the world’s foremost network protocol analyzer. It lets you capture and interactively browse the traffic running on a computer network. It is the de facto (and often de jure) standard across many industries and educational institutions.

Using it

$ module load wireshark/1.8.4 
$ wireshark

Sharktools

Sharktools is a Matlab and Python frontend to wireshark.

Using it

#Load the necessary prerequisites 
$ module load wireshark/1.4.15 matlab/r2011b python/2.7.2
#Load sharktools
$ module load sharktools/0.15
# python
>>> import pyshark
...

Massive network outage requires head node restarts!

Earlier today, a campus wide network outage disrupted communications between the Head Node VMs and their storage. Some of these may be working, however, nothing that is done on them properly saved. We will be restarting these machines shortly, after which, everything will return to normal.

This should not cause already scheduled jobs to fail, but any scripts running on the head nodes will surely fail.

We will send an “all-clear” when we have completed the list.

VASP Calculation Errors

UPDATE: The VASP binaries that generate incorrect results have been DELETED.

One of the versions of VASP installed on all RHEL6 clusters can generate incorrect answers.
The DFT energies calculated are correct, but the forces may not be correct.

The affected vasp binaries are located here:
/usr/local/packages/vasp/5.2.12/mvapich2-1.6/intel-12.0.0.084/bin/vasp
/usr/local/packages/vasp/5.2.12/mvapich2-1.7/intel-12.0.0.084/bin/vasp
/usr/local/packages/vasp/5.2.12/openmpi-1.4.3/intel-12.0.0.084/bin/vasp
/usr/local/packages/vasp/5.2.12/openmpi-1.5.4/intel-12.0.0.084/bin/vasp

All affected binaries were compiled with the intel/12.0.0.084 compiler.

Solution:
Use a different vasp binary – versions compiled with the intel/10.1.018 and intel/11.1.059 compilers have been checked for correctness.
Neither of those compilers generate incorrect answers on the test cases that discovered the error.

Here is an excerpt from a job script that uses a correct vasp binary:

###########################################################

#PBS -q force-6
#PBS -l walltime=8:00:00

cd $PBS_O_WORKDIR

module load intel/11.1.059 mvapich2/1.6 vasp/5.2.12
which vasp
#This “which vasp” command should print this:
#/usr/local/packages/vasp/5.2.12/mvapich2-1.6/intel-11.1.059/bin/vasp
#If it prints anything other than this, the modules loaded are not as expected, and you are not using the correct vasp.

mpirun -rmk pbs vasp
##########################################################

We now have a test case with known correct results that will be checked every time a new vasp binary is installed.
This step will prevent this particular error from occurring again.
Unless there are strenuous objections, this version of vasp will be deleted from the module that loads it (today) and the binaries will be removed from /usr/local/packages/ (in one week).

Thank you Ambarish for reporting this issue.

Let us know if you have any questions, concerns, or comments.