Posts

PACE clusters ready for research

Our quarterly maintenance is now complete, and the clusters are running previously submitted jobs and awaiting new submissions.

We have successfully completed a number of things:

  • Athena has been fully migrated to RedHat 6.3
  • The BioCluster /nv/pb4 filesystem has been migrated to the DDN space
  • All our Solaris storage servers have been patched
  • firewall upgrades are complete
  • electrical distribution
  • DDN updates
  • VMware updates
  • the mathlocal collection of software has been migrated to /nv/pma1

However, we were unable to complete the upgrade of the TestFlight cluster to RedHat 6.5.  At the moment TestFlight is down, and we will complete the upgrade over the next couple of days.

As always, please contact us (pace-support@oit.gatech.edu) for any problems or concerns you may have. Your feedback is very important for us, especially regarding file transfers in and out of the clusters.  (i.e. between your workstations and the PACE clusters)

PACE quarterly maintenance – April 15-16 2014

PACE Quarterly maintenance has begun

See this space for updates.

PACE Quarterly maintenance notification

It’s time again for our quarterly maintenance.  We will have the clusters down April 15 & 16.

As usual, we’ve instructed the schedulers to avoid running jobs that would cross into a planned maintenance window.  This will prevent running jobs from being killed, but also may mean jobs you submit now may not run until after maintenance completes.  I would suggest checking the wall times for the jobs you will be submitting and, if possible, modify them accordingly so they will complete sometime before the maintenance. Submitting jobs with longer wall times is still OK, but they will be held by the scheduler and released after maintenance completes.

Much of our activities time around are not directly visible, with a couple of notable exceptions.

We will be upgrading the operating system on our TestFlight cluster from RedHat 6.3 to RedHat 6.5.  Please do test your codes on this cluster over the coming weeks and months, as we plan to roll it out (along with any needed fixes) to all other RedHat 6 clusters in July.  This update is expected to bring some performance improvements, as well as some critical security fixes.  Additionally, it adds support for the Intel Ivy Bridge platform, which many of you are ordering.  Any new Ivy Bridge platforms will start with RedHat 6.5.

Other user visible changes include:

  • conclude the migration of the Athena cluster to RedHat 6.3.  We’ll plan to take Athena to 6.5 in July.
  • conclude the migration of the BioCluster /nv/pb4 filesystem to the DDN/GPFS space.
  • migrate mathlocal from /nv/hp24 to /nv/pma1 (Math cluster project space)
  • application of recommended and security patches to our Solaris storage systems.  This is a widespread update will affect filesystems that start with /nv.  A rapid reversion process is available should unanticipated events occur.
  • firewall upgrades to increase bandwidth between PACE and campus

Not so apparent changes include:

  • repairing some electrical distribution to compute node racks
  • minor software/firmware update to DDN to enable support of DDN/WOS evaluation
  • updates to VMware “hardware” levels, enabled by previous migration to VMware 5.1

As always, please follow our blog for communications, especially for announcements during our maintenance activities – and let us know of any concerns via pace-support@oit.gatech.edu.

[RESOLVED] PACE clusters experiencing problems

We’ve identified the source of problems which impacted all of the clusters this (4/7) afternoon.  While making preparations to deploy some firewall upgrades for PACE, one of the campus network team members inadvertently applied a misconfiguration to one of our core network links.  This resulted in widespread packet loss across the PACE internal network.

The head nodes seem to have recovered properly, but please let us know if you see continued issues there.  While it is possible that jobs have been lost, we believe that most things will have recovered without loss.

We’ll continue to monitor the situation and address any remaining problems as soon as we are able.

PACE Team

 

images requested for annual CASC brochure

The time has come again to gather images for the annual CASC brochure. CASC is the Coalition for Academic Scientific Computation, and GT is a member institution. We use the brochure in our advocacy efforts at the funding agencies and in D.C. Previous brochures are online at http://casc.org/research-publications.

If you have something you would be interested in sharing, please let me know. Below is some text from the CASC regarding what they are looking for.

This year marks the 25th anniversary of CASC and we want to recognize that milestone in the new brochure. If you have historical pictures, scientific visualizations and/or stories that can help us illustrate how CASC and HPC have evolved over the years, please start gathering those now. We will set up a website soon where you can upload your images and text. We hope to have everything we need by June 1, 2014.

As always, we are looking for high-quality images and stories that illustrate the impact of HPC and related technologies. The more we have the better, but we are especially interested in images and stories about research and accomplishments in Energy, Health and Medicine, Industrial Innovation, Environment and Natural Resources, Matter and the Universe, Education and Outreach, and Big Data. More information about how to upload your images and text will be sent shortly. The deadline will be earlier this year: June 15, 2014.

Xeon Phi SW Developer Training – Atlanta – April 3rd

This one-day training, held in Atlanta on Thursday, April 3rd, will provide software developers the foundation needed for modernizing their code to take advantage of parallel architectures found in both the Intel® Xeon® processor and the Intel® Xeon Phi™ coprocessor.

The session will cover:
– An overview of parallel programming frameworks and optimization guidelines for multi-core CPUs (Intel® Xeon®) and many-core coprocessors (Intel® Xeon Phi™)
– Discussions about three layers of parallelism: SIMD, Threads, Cluster environment
– Tips for quick porting/development of HPC software applications
– Real-life examples of code and optimization techniques
– Hardware solution and corresponding software implementations, APIs, and framework

Click here to register for the Atlanta event

CMG 2014 Coming to Atlanta, Calling for papers, presentations, and workshops!

CALL FOR PAPERS AND PRESENTATIONS

(see the end of the post regarding the call for Workshops)

The Computer Measurement Group (CMG) calls for papers and presentations for the 40th International Conference to be held November 3-6, 2014.  The 2014 CMG conference will cover all areas of systems management, including but not limited to: capacity planning, IT service management, application performance management, performance engineering and testing, as well as the latest developments in the overall field of computer performance evaluation.

CMG is the source of unbiased and objective expert information and practical, real life experiences across all computing platforms in the computer industry for over 35 years.  Share your knowledge and experiences: write a paper and submit it for presentation at Performance and Capacity 2014 by CMG.

Submissions at all levels are welcome and encouraged.  We especially encourage papers on User Experiences.  All paper and presentation submissions will be evaluated through a blind peer-referee process, and will be categorized as Introductory, Tutorial, Advanced, or User Experience.  Mentors are available for writing assistance, and should be requested early in the writing process. Editorial assistance is provided for all accepted papers.

Primary subject areas for papers, for any and all platforms, are as follows:

Subject Areas

Suggested topics (but not limited to)

IT Service Management ITIL processes and IT Service Management, asset management, BPM, BAM, corporate governance (HIPAA, HEPA, SOX), CPM, dashboards, KPIs, customer SLAs, infrastructure performance management, SaaS, Cloud Computing, etc.
Capacity Planning Capacity management issues, trending and prediction, statistics, forecasting, simulation, analytic & hybrid modeling, server consolidation, wireless capacity, etc.
Application Performance Management Data collection and reduction, monitoring, performance databases, ad hoc reporting techniques, web performance, workload optimized systems, performance visualization. Tuning of performance parameters and programs for storage, operating system, networks, software products, applications, cloud, etc.
Performance Engineering & Testing Software Performance Engineering (SPE), simulation, benchmarking, creating appropriate test loads, conducting stress tests, and evaluating the results.
Industry & Professional Trends Cybersecurity, DevOps, Big Data, Mobile Computing, Emerging Technologies, Professional Development, etc.

 

Important Paper Submission Dates

Request for Mentor Due

May 23, 2014

Paper Abstract Due

June 16, 2014

Paper (Draft) Due

June 16, 2014

Referee Review Period

June 17 – July 14, 2014

Author Acceptance Notification

Mid – Late July, 2014

Editorial Review Period

August/September, 2014

Final Camera Ready Copy Due
Presentation Slides Due

September 15, 2014

October 6, 2014

Please note:  The Paper Abstract Due date is the date to submit an abstract or a brief summarization of your paper.  There is no acceptance process for abstracts and no acknowledgement will be sent; your abstract simply indicates your intent to submit a paper.  The Paper Due date is the deadline for submission to the referee process.  (Note that papers, which are submitted, as slides must include, detailed speaker’s notes.)  Once a paper is refereed, accepted, and edited, the final version is due on the Final Camera Ready Copy Due date.  This is also the date when the paper must have all updates, corrections or modifications approved and ready for publication in the CMG Proceedings.

CMG will also consider papers on topics and technologies that become available later in the year.  Such papers will be considered for acceptance on a case-by-case basis by the Program Committee.

CMG will continue to use the paper submission system called EDAS (www.edas.info).  Instructions for using EDAS can be found at the following URL under “Author and Speaker Information”: http://www.cmg.org/conference/cmg2014

 

Please direct all inquiries to:

Kathy Steffens, CMG 2014 Program Chair – cmgpc@cmg.org

 

CALL FOR WORKSHOPS

IT professionals responsible for today’s computing environments know the only constant is change.  Today’s competitive advantage will quickly become tomorrow’s legacy liability.  To apply the latest techniques, improve your skills in performance optimization, software performance engineering, load testing, benchmarking, resource management, capacity analysis, simulation and modeling, and cost management, access to knowledge from industry leaders is no longer an option, but a requirement!

CMG is the primary source of expert computer performance knowledge, capturing the collective wisdom of the very best in the industry for over 35 years.  CMG calls for Workshops to be presented the first day of the 40th Annual International Conference being held November 3-6, 2014.  We ask for Workshops that focus not only on performance and capacity issues, but also on any technologies, disciplines, techniques, and approaches related to Systems Management or Information Systems.

Submissions in the areas of big data, cloud computing, performance visualization, mobile computing, and web performance are especially encouraged.  We are looking for “How To” workshops that explain how to apply popular tools and products to important performance & capacity tasks.  “How To” workshops are meant to educate performance professionals in the application of commercially available technology.

Submissions at all levels are requested: introductory, technical, managerial and leading edge.  We require PowerPoint slides with speaker’s notes prior to the conference, in time for publication in the workshop book.  Workshops are traditionally 3 hours in length, plus a 15 minute break.

 

Sample Subject Areas: z/OS, LINUX/Unix/Windows, Cloud Computing, Big Data, Virtualization, Storage, Managing Big Data, Network Performance, Web Performance, Database Performance, Wireless Capacity, Future Technologies, Executive Management, Load & Stress Testing, Software Performance Engineering, Simulation and Analytic Modeling, Applied Statistics & Forecasting, ITIL, ICCP, Performance Visualization, SaaS.

 

Critical Dates for Workshop Submission

Title & Abstract Due

June 8, 2014

Author Acceptance Notification

Late July, 2014

PowerPoint Copy Due

September 1, 2014

Camera Ready Final Copy Due

October 5, 2014

 

Please note:  The Title & Abstract Due date is the date to submit a brief idea or outline of your proposed workshop.  The PowerPoint Copy Due date implies that the workshop is nearly complete, and that the presentation is ready for review by a CMG-assigned editor.  The Camera Ready Final Copy Due date occurs after the presentation has been accepted, reviewed by an editor, re-submitted with any updates, corrections or modifications, and is ready for publication in the CMG Workshop Drive.

Please direct all inquiries to:

Kathy Steffens, CMG 2014 Program Chair – cmgpc@cmg.org

 

XSEDE-14 in Atlanta!

As you know, XSEDE is an NSF-funded digital services project that helps researchers and scientists do their work more effectively and efficiently. XSEDE allows for free access to some of the nation’s fastest supercomputers and most knowledgeable help staff.

Last summer’s XSEDE’13 saw more than 700 attendees from all 50 states and 14 countries attend the conference in San Diego. We expect more in Atlanta, especially with so many great area universities, colleges and institutions. With our theme “Engaging Communities,” we encourage both traditional users of digital resources (researchers, students, post-docs in traditional sciences)  and those who haven’t historically used these resources (humanities, economics, art). Students are strongly encouraged to participate and can find various ways to engage with the conference.

The Call for Participation is out and the first deadline is fast approaching: March 15 papers and abstracts are due. Visit here for more deadlines: https://conferences.xsede.org/xsede14/call-for-participation#Key%20dates

PC1 (Cygnus) filesystem woes

We’ve continued to have issues with the server, and we’ve now identified a networking issue tied to this server as well as a corrupted OS image.

The networking issue has be rectified, and I am installing a new software image onto this machine as I type this.

Despite the nature of the failure, we have not lost any of your already saved data — the drive units which house the OS are separate from the ones storing your data.

We should have this machine back in about a half hour.

Call For Papers – XSEDE14

Greetings all,

XSEDE14 is coming up soon, and has issued their call for participation.  Please note that this conference is being held in Atlanta!

Selected papers from all tracks will be invited to extend the manuscripts to be considered for publication in a special issue of the journal of Concurrency and Computation Practice and Experience.  Papers accepted for the “Education, Outreach, and Training” track will be invited to extend the manuscripts for publication in the Journal of Computational Science Education.

Abstracts are due March 15.  Please see https://www.xsede.org/xsede14 for further information.

Jobs Accidentally Killed

Dear users,

At least 1,600 queued and running jobs were accidentally killed last week by a member of the PACE-team, who was trying to clear out their own jobs. PACE-team accounts have elevated rights to certain commands, and the person who deleted the jobs did not realize that the command they were using would apply to more then just their own jobs.

If you have access to the iw-shared-6 queue, and were running jobs and/or had jobs queued earlier this week, this accident has likely impacted you.

Our deepest apologies for the unexpected and early job-terminations. We are re-evaluating our need to grant elevated permissions to our regular accounts, in order to prevent this from happening again

Thank you,
PACE team