It appears that we have an issue with the server housing the /nv/pc5 filesystem, which contains a subset of the Cygnus cluster users. We’re trying to isolate the source of the problem, but we have yet to actually find a pattern to why it is available on some nodes and not on others.
Author: ssarajlic3
Joe Cluster Status
Around 8, 8:30pm on September 28, 2012, a power event took down the TSRB data center, knocking a significant fraction of the Joe cluster offline.
With assistance from Operations, we are now bringing these nodes online after determining that several of the management switches for these nodes did not recover from the event gracefully. As these switches control our ability to manage the nodes, we had to wait until the switches were available to bring nodes online, now at about 4pm on September 29, 2012.
Jobs that were running on these nodes (iw-a2-* and iw-a3-*) at the time of the outage may have terminated abnormally. Jobs scheduled but not running should be fine.
UPDATE @ 4:40pm, 2012-09-29: All nodes are online.
New and Updated Software: GCC, Maxima, OpenCV, Boost, ncbi_blast
Software Installation and Updates
We have had several requests for new or updated software since the last post on August 14.
Here are the details about the updates.
All of this software is installed on RHEL6 clusters (including force-6, uranus-6, ece, math, apurimac, joe-6, etc.)
GCC 4.7.2
The GNU Compiler Collection (GCC) includes compilers for many languages (C, C++, Fortran, Java, and Go).
This latest version of GCC supports advanced optimizations for the latest compute nodes in PACE.
Here is how to use it:
$ module load gcc/4.7.2
$ gcc <source.c>
$ gfortran <source.f>
$ g++ <source.cpp>
Versions of GCC already installed on RHEL6 cluster are gcc/4.4.5, gcc/4.6.2, and gcc/4.7.0
Maxima 5.28.0
Maxima is a system for the manipulation of symbolic and numerical expressions, including differentiation, integration, Taylor series, Laplace transforms, ordinary differential equations, systems of linear equations, polynomials, and sets, lists, vectors, matrices, and tensors. Maxima yields high precision numeric results by using exact fractions, arbitrary precision integers, and variable precision floating point numbers. Maxima can plot functions and data in two and three dimensions.
Here is how to use it:
$ module load clisp/2.49.0 maxima/5.28.0
$ maxima
#If you have X-Forwarding turned on, "xmaxima" will display a GUI with a tutorial
$ xmaxima
OpenCV 2.4.2
OpenCV (Open Source Computer Vision) is a library of programming functions for real time computer vision.
OpenCV is released under a BSD license, it is free for both academic and commercial use. It has C++, C, Python and soon Java interfaces running on Windows, Linux, Android and Mac. The library has more than 2500 optimized algorithms.
This installation of OpenCV has been installed with support for Python and NumPy. It has been installed without support for Intel TBB, Intel IPP, or CUDA.
Here is how to use it:
$ module load gcc/4.4.5 opencv/2.4.2
$ g++ <source.cpp> $(pkg-config --libs opencv)
Boost
Boost provides free peer-reviewed portable C++ source libraries.
Boost libraries are intended to be widely useful, and usable across a broad spectrum of applications.
Here is how to use it:
$ module load boost/1.51.0
$ g++ <source.cpp>
NCBI BLAST
Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences.
Here is how to use it:
$ module load gcc/4.4.5 ncbi_blast/2.2.27
$ blastn
$ blastp
$ blastx
...
Registration open for OpenACC GPU Programming Workshop
Extreme Science and Engineering Discovery Environment
http://xsede.org/
Registration open for October 2012
OpenACC GPU Programming Workshop
One hundred registrants will be accepted for the OpenACC GPU Programming Workshop, to be held October 16 and 17, 2012. The workshop includes hand-on access to Keeneland, the newest XSEDE resource, which is managed by the Georgia Institute of Technology (Georgia Tech) and the National Institute for Computational Sciences, an XSEDE partner institution.
Based on demand, the workshop is scheduled to be held at ten different sites around the country. Anyone interested in participating is asked to follow the link below and then register by clicking on the preferred site. Only the first 100 registrants will be accepted.
The workshop is offered by the Pittsburgh Supercomputing Center, the National Institute for Computational Sciences, and Georgia Tech.
Questions? Contact Tom Maiden at tmaiden@psc.edu.
Register and read more about the workshop at:
http://www.psc.edu/index.php/training/openacc-gpu-programming
[XSEDE is supported by the National Science Foundation; https://www.xsede.org, info@xsede.org.]
Free MATLAB Technical Seminars on Tuesday
As a friendly reminder, you are invited to join MathWorks for complimentary MATLAB seminars on Tuesday, September 18, 2012 in Room 144 in Clough Undergraduate Commons.
–Register now– Register at http://www.mathworks.com/seminars/GATech2012
–Agenda—
5:30 – 6:30 p.m.
Session 1: What’s New in MATLAB?
Presented By: Loren Shure, Principal MATLAB Developer (KEYNOTE SPEAKER)
In this session, we will demonstrate workflow examples highlighting and utilizing new MATLAB features. The latest MATLAB release, R2012b, introduces a redesigned Desktop, making it easier to help both new and experienced users navigate the continuously expanding capabilities within MATLAB.
Loren has worked at MathWorks for over 25 years. She has co-authored several MathWorks products in addition to adding core functionality to MATLAB. Loren currently works on the design of the MATLAB language. She graduated from MIT with a B.Sc. in physics and has a Ph.D. in marine geophysics from the University of California, San Diego, Scripps Institution of Oceanography. Loren writes about MATLAB on her blog, The Art of MATLAB.
6:30 – 7:00 p.m.
Georgia Tech Alumni Panel
Hear from a selection of Georgia Tech Alumni who now work at The MathWorks as they discuss their career paths. (Pizza will be served.)
7:00 – 8:30 p.m.
Session 2: Parallel and GPU Computing with MATLAB
Presented By: Jiro Doke, Ph.D., Senior Application Engineer and Georgia Tech alumnus
In this session you will learn how to solve computationally and data-intensive problems using multicore processors, GPUs, and computer clusters. We will introduce you to high-level programming constructs that allow you to parallelize MATLAB applications and run them on multiple processors. We will show you how to overcome the memory limits of your desktop computer by distributing your data on a large scale computing resource, such as a cluster. We will also demonstrate how to take advantage of GPUs to speed up computations without low-level programming. Highlights include:
· Toolboxes with built-in support for parallel computing
· Creating parallel applications to speed up independent tasks
· Scaling up to computer clusters, grid environments or clouds
· Employing GPUs to speed up your computationsJiro joined MathWorks in May 2006 as an application engineer. He received his B.S. from Georgia Institute of Technology and Ph.D. from the University of Michigan, both in Mechanical Engineering. His Ph.D. research was in biomechanics of human movement, specifically in human gait. His experience in MATLAB comes from extensive use in graduate school, using the tool for data acquisition, analysis, and visualization. At MathWorks, Jiro focuses on core MATLAB; math, statistics and optimization tools; and parallel computing tools.
Joe file server back online
After working with the network team, we appear to have stabilized the networking for the file server. We apologize for the inconvenience.
Joe file server still having difficulties
The network interfaces on the file server providing service to Joe cluster are currently having problems determining which is up and which is down. This started around 4:30am, and we are engaging the network team to isolate the problem to the machine, cables, or switches.
Joe Fileserver fixed
The fileserver that houses Joe users’ data ( hp3 / pj1 ) started acting squirrelly this morning, finding itself unable to connect to the PACE LDAP server. That, in turn, caused Joe users to have problems logging in or having their jobs hang up because the fileserver could not authenticate users/jobs.
Restarting all the services on the fileserver rectified the problem.
FoRCE project server outage (pf2)
At about 4:30pm, one of the network interfaces for the server hosting the /nv/pf2 filesystem was knocked offline, causing the resources hosted by it to be unavailable. Normally, this shouldn’t have caused complete failure, but the loss of network exposed what was a configuration error in the fail-over components.
At 5:10, both the misconfiguration as well as the failed interface were brought back online, which should have brought all resources provided by this server online.
This affected some FoRCE users’ access to project storage. Please double check to see if jobs may have failed because of this outage. Data should not have been lost, as any transactions in progress should have been held up until connectivity was restored.
[Resolved] Unexpected downtime on compute nodes
[update] We think we’re back up at this point. If you see odd behavior, please send a support request directly to the PACE team via email to pace-support@oit.gatech.edu.
The issue seems to have been an inadvertent switching off of a circuit breaker by an electrician, and is not expected to recur.
====================
We’ve had a power problem in the data center this afternoon that caused a loss of power to three of our racks. This has affected some (or all) portions of the following clusters:
Apurimac
Prometheus
Cygnus
Granulous
ECE
Monkeys
Isabella
CEE
Aryabhata
Optimus
Atlas
BioCluster
We’re looking into the cause of the problem, and have already started bringing up compute nodes.