Posts

PACE Sponsors High Performance Computing Townhall!

PACE sponsoring HPC Townhall

What could you do with over 25,000 computer cores? Join faculty and students at the April 30 High Performance Computing Town Hall to find out. The event will be held in the MaRC auditorium and is sponsored by PACE, Georgia Tech’s Advanced Computing Environment program.

When: April 30, 3-5pm
Where: MaRC Auditorium (Map to location)

Overview

PACE provides researchers with a robust computing platform that enables faculty and students to carry out research initiatives without the burden of maintaining infrastructure, software, and dedicated technicians. The program’s services are managed by OIT’s Academic & Research Technologies department and include physical hosting, system management infrastructure, high-speed scratch storage, home directory space, commodity networking, and common HPC software such as RedHat Enterprise Linux, VASP, LAMMPS, BLAST, Matlab, Mathematica, and Ansys Fluent. Various compilers, math libraries and other middleware is available for those who author their own codes.  All of these resources are designed and offered with the specific intention of combining intellect with efficiency, in order to advance the research presence here at Tech to the peak of its abilities.

There are many ways to participate with PACE.  With a common infrastructure, we support clusters dedicated to individual PIs or research groups, clusters that are shared amongst participants and our FoRCE Research Computing Environment (aka “The FoRCE”).  The FoRCE is available to all campus users via a merit-based proposal mechanism.

The April 30 HPC Town Hall is open to members of the Tech research community and will feature presentations on the successes and challenges that PACE is currently experiencing, followed by a panel discussion and Q&A.

For more information on the PACE program, visit the official website at www.pace.gatech.edu, and also the program’s blog at blog.pace.gatech.edu.

Agenda (To Be Finalized Soon)

  • Message from Georgia Tech’s CTO Ron Hutchins
  • Message from PACE’s director Neil Bright
  • Lightning Talks By Faculty
  • Discussion around technologies and capabilities currently under investigation by PACE
  • Panel Discussion regarding future directions for PACE
  • Question and Answer Session

Account related problems on 03/14/2013

We experienced some account management difficulties today (03/14/2013), mostly caused by exceeding the capacity of our database. We found the cause and fixed all of the issues. 

This problem might have affected you in two different ways. First, temporary login problems to the headnodes, and second, failure of some recently allocated jobs on compute nodes. As far as we know, none of the running jobs are affected.

We apologize for any inconvenience this might have caused. If you have experienced any problems, please send us a note (pace-support@oit.gatech.edu).

 

 

PACE Debugging and Profiling Workshop on 03/21/2013

Dear PACE community,

We are happy to announce the first of the Debugging and Profiling Workshop that will take place on 03/21/2013 1pm-5pm, in the Old Rich Building Conference Room (ITDC 242).

If your code is crashing, hanging, producing inaccurate results, or running unbearably slow, you *do* want to be there. We will go over text and GUI based tools that are available on the PACE clusters, including gdb, valgrind, DDT, gprof, PAPI and TAU. There will be hands-on examples, so bring your laptop if you can, although it is not mandatory.

If you bring a laptop to follow the hands-on examples, please make sure that you have:

  • An active PACE account with access to one of the RHEL6 queues
  • Access to “GTwifi”
  • A terminal client to login (PuTTy for windows, Terminal for Mac)
  • A text editor that you are comfortable with (Vim, Emacs, nano, …)

Don’t worry if your laptop is not configured to access the PACE clusters. I will be in the conference room half an hour early to help you prepare for the session. Just show up a bit early with your laptop, and we will take care of the rest together 🙂

Please RSVP (to mehmet.belgin@oit.gatech.edu) by 03/19/2003 and include your GT username. Your RSVP will guarantee a seat and printed out copies of the course material. You will also be able to fetch an electronic copy (including all the slides and codes) anytime by running a simple command on the cluster (we will do that during the class).

Here’s the full schedule:

  • 12:30pm -> 1:00pm : (Optional) Help session to make sure your laptop is ready for the workshop
  •  1:00pm -> 2:45pm : Debugging session (gdb, valgrind, DDT)
  •  2:45pm -> 3:15pm : Break
  •  3:15pm -> 5:00pm : Profiling session (gprof, PAPI, TAU )

The location is the Old Rich Building, ATDC conference room, #242. The google knows us as “258 4th Street“. We are right across the Clough Commons Building.

We look forward to seeing you there!

Breaking news from NSF

Looks like Dr. Subra Suresh will be stepping down from his position as Director of NSF, effective late March to become the next President of Carnegie Mellon.

Click the link here: Staff Letter 2-4-13 to download a copy of his letter to the NSF community.

Interesting times are ahead for both NSF and DOE.

New and Updated Software: Portland Group Compiler and ANSYS

Two new sets of software have been installed on PACE-managed systems – PGI 12.10 and ANSYS 14.5 service pack 1.

PGI 12.10

The Portland Group, Inc. (a.k.a. PGI) makes software compilers and tools for parallel computing. The Portland Group offers optimizing parallel FORTRAN 2003, C99 and C++ compilers and tools for workstations, servers and clusters running Linux, MacOS or Windows operating systems based on the following microprocessors:

This version of the compiler supports the OpenACC GPU programming directives.
More information can be found at The Portland Group website.
Information about using this compiler with the OpenACC directives can be found at PGI Insider and OpenACC.

Usage Example

$ module load pgi/12.10
$ pgfortran example.f90
$ ./a.out
Hello World

ANSYS 14.5 Service Pack 1

ANSYS develops, markets and supports engineering simulation software used to foresee how product designs will behave and how manufacturing processes will operate in real-world environments.

Usage Example

$ module load ansys/14.5
$ ansys145

Panasas problems, impacting all PACE clusters

The Panasas storage server started responding slowly approximately an hour ago. We are using this server to host all of the software stack, and also for the “scratch” directory in your home folders. 

No jobs have been killed, but you will notice significant degradation in the performance. Starting new jobs/commands will be also slow, although they should run.

We are actively working with the vendor to resolve these issues and will keep you updated via this blog and the “pace-availability” email list.

Thank you for your patience.

PACE Team

Collapsing nvidiagpu and nvidia-gpu queues

PACE has several nodes with NVidia GPUs installed.
There are currently two queues (nvidiagpu and nvidia-gpu) that have GPU nodes assigned to them.
It is confusing to have two queues with the same purpose and slightly different names, so PACE will be collapsing both queues into the “nvidia-gpu” queue.
That means that the nvidiagpu queue will disappear, and the nvidia-gpu queue will have all of the resources contained by both queues.

Please send any questions or concerns to pace-support@oit.gatech.edu

January 2013 quarterly maintenance is complete

Greetings!

We have completed our quarterly maintenance activities.  Head nodes are online again and available for use, queued up jobs have been released, and the scheduler is awaiting new submissions.

Our RedHat 6 clusters have received system software updates.  Please keep an eye on your jobs to verify everything is operating correctly.

Our Panasas scratch storage has received another round of updates.  Preliminary testing indicates that we should have a resolution to our crashes, but the quota system is known to be broken.  As advised by Panasas, we have disabled quotas on scratch.  Please do your best to stay below the 20TB threshold.  We will be monitoring usage and know where you live.  🙂

We have a new license server providing checkouts of the Portland Group and Intel compilers, Matlab DCS, the Allinea DDT debugger and Lumerical.  Please let us know if you have problems accessing this software.  The old server is still running and we will be monitoring it for a short while for extraneous activity.

More nodes from Joe and the FoRCE have been converted from RHEL5 to RHEL6.  If you are still using the RHEL5 side of the world, please prioritize a transition to RHEL6.  We stand ready to assist you with this transition.

Finally, our new configuration system has been deployed in prototype mode.  We will use this to gather operational information and other data that will facilitate a full transition to this system in a future maintenance day.

As usual, please let us know (via email to pace-support@oit.gatech.edu) if you encounter any issues.

Happy Computing!

–Neil Bright
 

Symposium: Integrating Computational Science into your Undergraduate Curriculum

Clemson University (Clemson, SC) is hosting a symposium on February 11, 12, and 13.
The topic is “Integrating Computational Science into your Undergraduate Curriculum”
The workshop, symposium and training are open at no charge to all interested faculty and students who register to attend.
Financial assistance for primarily undergraduate faculty is available to cover travel costs.

See the Symposium website for the agenda and registration information.

Datacenter modifications

Tomorrow morning (January 9) at 8:30am, facilities management will be performing some work on the power distribution systems in the Rich datacenter.  None of this work is being performed on anything that power PACE systems; there should be zero impact on any job or computer that PACE manages.  However, due to the nature of sharing space in the datacenter; in the event of a major problem, PACE systems may be affected.

Once again, there should be zero impact on PACE systems; no jobs or computers should be affected.

Please let us know (via email to pace-support@oit.gatech.edu) if you have any questions or concerns.