Xeon Phi SW Developer Training – Atlanta – April 3rd

This one-day training, held in Atlanta on Thursday, April 3rd, will provide software developers the foundation needed for modernizing their code to take advantage of parallel architectures found in both the Intel® Xeon® processor and the Intel® Xeon Phi™ coprocessor.

The session will cover:
– An overview of parallel programming frameworks and optimization guidelines for multi-core CPUs (Intel® Xeon®) and many-core coprocessors (Intel® Xeon Phi™)
– Discussions about three layers of parallelism: SIMD, Threads, Cluster environment
– Tips for quick porting/development of HPC software applications
– Real-life examples of code and optimization techniques
– Hardware solution and corresponding software implementations, APIs, and framework

Click here to register for the Atlanta event

CMG 2014 Coming to Atlanta, Calling for papers, presentations, and workshops!

CALL FOR PAPERS AND PRESENTATIONS

(see the end of the post regarding the call for Workshops)

The Computer Measurement Group (CMG) calls for papers and presentations for the 40th International Conference to be held November 3-6, 2014.  The 2014 CMG conference will cover all areas of systems management, including but not limited to: capacity planning, IT service management, application performance management, performance engineering and testing, as well as the latest developments in the overall field of computer performance evaluation.

CMG is the source of unbiased and objective expert information and practical, real life experiences across all computing platforms in the computer industry for over 35 years.  Share your knowledge and experiences: write a paper and submit it for presentation at Performance and Capacity 2014 by CMG.

Submissions at all levels are welcome and encouraged.  We especially encourage papers on User Experiences.  All paper and presentation submissions will be evaluated through a blind peer-referee process, and will be categorized as Introductory, Tutorial, Advanced, or User Experience.  Mentors are available for writing assistance, and should be requested early in the writing process. Editorial assistance is provided for all accepted papers.

Primary subject areas for papers, for any and all platforms, are as follows:

Subject Areas

Suggested topics (but not limited to)

IT Service Management ITIL processes and IT Service Management, asset management, BPM, BAM, corporate governance (HIPAA, HEPA, SOX), CPM, dashboards, KPIs, customer SLAs, infrastructure performance management, SaaS, Cloud Computing, etc.
Capacity Planning Capacity management issues, trending and prediction, statistics, forecasting, simulation, analytic & hybrid modeling, server consolidation, wireless capacity, etc.
Application Performance Management Data collection and reduction, monitoring, performance databases, ad hoc reporting techniques, web performance, workload optimized systems, performance visualization. Tuning of performance parameters and programs for storage, operating system, networks, software products, applications, cloud, etc.
Performance Engineering & Testing Software Performance Engineering (SPE), simulation, benchmarking, creating appropriate test loads, conducting stress tests, and evaluating the results.
Industry & Professional Trends Cybersecurity, DevOps, Big Data, Mobile Computing, Emerging Technologies, Professional Development, etc.

 

Important Paper Submission Dates

Request for Mentor Due

May 23, 2014

Paper Abstract Due

June 16, 2014

Paper (Draft) Due

June 16, 2014

Referee Review Period

June 17 – July 14, 2014

Author Acceptance Notification

Mid – Late July, 2014

Editorial Review Period

August/September, 2014

Final Camera Ready Copy Due
Presentation Slides Due

September 15, 2014

October 6, 2014

Please note:  The Paper Abstract Due date is the date to submit an abstract or a brief summarization of your paper.  There is no acceptance process for abstracts and no acknowledgement will be sent; your abstract simply indicates your intent to submit a paper.  The Paper Due date is the deadline for submission to the referee process.  (Note that papers, which are submitted, as slides must include, detailed speaker’s notes.)  Once a paper is refereed, accepted, and edited, the final version is due on the Final Camera Ready Copy Due date.  This is also the date when the paper must have all updates, corrections or modifications approved and ready for publication in the CMG Proceedings.

CMG will also consider papers on topics and technologies that become available later in the year.  Such papers will be considered for acceptance on a case-by-case basis by the Program Committee.

CMG will continue to use the paper submission system called EDAS (www.edas.info).  Instructions for using EDAS can be found at the following URL under “Author and Speaker Information”: http://www.cmg.org/conference/cmg2014

 

Please direct all inquiries to:

Kathy Steffens, CMG 2014 Program Chair – cmgpc@cmg.org

 

CALL FOR WORKSHOPS

IT professionals responsible for today’s computing environments know the only constant is change.  Today’s competitive advantage will quickly become tomorrow’s legacy liability.  To apply the latest techniques, improve your skills in performance optimization, software performance engineering, load testing, benchmarking, resource management, capacity analysis, simulation and modeling, and cost management, access to knowledge from industry leaders is no longer an option, but a requirement!

CMG is the primary source of expert computer performance knowledge, capturing the collective wisdom of the very best in the industry for over 35 years.  CMG calls for Workshops to be presented the first day of the 40th Annual International Conference being held November 3-6, 2014.  We ask for Workshops that focus not only on performance and capacity issues, but also on any technologies, disciplines, techniques, and approaches related to Systems Management or Information Systems.

Submissions in the areas of big data, cloud computing, performance visualization, mobile computing, and web performance are especially encouraged.  We are looking for “How To” workshops that explain how to apply popular tools and products to important performance & capacity tasks.  “How To” workshops are meant to educate performance professionals in the application of commercially available technology.

Submissions at all levels are requested: introductory, technical, managerial and leading edge.  We require PowerPoint slides with speaker’s notes prior to the conference, in time for publication in the workshop book.  Workshops are traditionally 3 hours in length, plus a 15 minute break.

 

Sample Subject Areas: z/OS, LINUX/Unix/Windows, Cloud Computing, Big Data, Virtualization, Storage, Managing Big Data, Network Performance, Web Performance, Database Performance, Wireless Capacity, Future Technologies, Executive Management, Load & Stress Testing, Software Performance Engineering, Simulation and Analytic Modeling, Applied Statistics & Forecasting, ITIL, ICCP, Performance Visualization, SaaS.

 

Critical Dates for Workshop Submission

Title & Abstract Due

June 8, 2014

Author Acceptance Notification

Late July, 2014

PowerPoint Copy Due

September 1, 2014

Camera Ready Final Copy Due

October 5, 2014

 

Please note:  The Title & Abstract Due date is the date to submit a brief idea or outline of your proposed workshop.  The PowerPoint Copy Due date implies that the workshop is nearly complete, and that the presentation is ready for review by a CMG-assigned editor.  The Camera Ready Final Copy Due date occurs after the presentation has been accepted, reviewed by an editor, re-submitted with any updates, corrections or modifications, and is ready for publication in the CMG Workshop Drive.

Please direct all inquiries to:

Kathy Steffens, CMG 2014 Program Chair – cmgpc@cmg.org

 

XSEDE-14 in Atlanta!

As you know, XSEDE is an NSF-funded digital services project that helps researchers and scientists do their work more effectively and efficiently. XSEDE allows for free access to some of the nation’s fastest supercomputers and most knowledgeable help staff.

Last summer’s XSEDE’13 saw more than 700 attendees from all 50 states and 14 countries attend the conference in San Diego. We expect more in Atlanta, especially with so many great area universities, colleges and institutions. With our theme “Engaging Communities,” we encourage both traditional users of digital resources (researchers, students, post-docs in traditional sciences)  and those who haven’t historically used these resources (humanities, economics, art). Students are strongly encouraged to participate and can find various ways to engage with the conference.

The Call for Participation is out and the first deadline is fast approaching: March 15 papers and abstracts are due. Visit here for more deadlines: https://conferences.xsede.org/xsede14/call-for-participation#Key%20dates

PC1 (Cygnus) filesystem woes

We’ve continued to have issues with the server, and we’ve now identified a networking issue tied to this server as well as a corrupted OS image.

The networking issue has be rectified, and I am installing a new software image onto this machine as I type this.

Despite the nature of the failure, we have not lost any of your already saved data — the drive units which house the OS are separate from the ones storing your data.

We should have this machine back in about a half hour.

Emergency reboot of compute nodes due to power/cooler outage

The Rich data center cooling system experienced a power outage today (2/6/2014) at around 9:20am when both the main and backup power systems failed requiring an emergency shutdown of all PACE compute nodes. We have since received confirmation from the operations team the room cooling is now stable but using the backup chillers while work proceeds to correct the problem. We are currently bringing the compute nodes back online as quickly as possible.

If you had queued jobs before the incident, they should start running as soon as sufficient number of compute nodes are brought online. However, all of the jobs running at the time of the failure are killed, and they need to be resubmitted. You can monitor the node status using ‘pace-stat’ and ‘pace-check-queue’ commands.

We are sorry for the inconvenience this failure have caused. Please contact us if you have any concerns or questions.

 

Scratch Quota Policies are Changing

We would like to give you a heads up of some upcoming adjustments to the scratch space quotas.

Current policy is a 10TB soft quota and a 20TB hard quota.  Given the space problems we’ve been having with the scratch, we will be adjusting this to a 5TB soft quota, and a 7 TB hard quota.  This change should only affect a small handful of users.  Given the close proximity to our maintenance next week, we will be making this change at the end of January.  This is an easy first step that we can take to start addressing the recent lack of space on scratch storage.  We are looking at a broad spectrum of other policy and technical changes, including changing retention times, improving our detection of “old” files, as well as increasing capacity.  If you have any suggestions for other adjustments to scratch policy, please feel free to let us know (pace-support@oit.gatech.edu).

Please remember that the scratch space is intended for transient data – not as a long term place to keep things.

January Maintenance is over

January maintenance is complete, and clusters started accepting and running jobs. We accomplished all of the primary objectives, and even found time to address a few bonus items.

Most importantly, we completed updating the resource and scheduling managers (torque and moab) throughout the entire PACE realm. This upgrade should bring visible improvements in the speed and reliability. Please note that the job submission process will show some differences after this update, therefore we strongly encourage you to read the transition guide here: http://www.pace.gatech.edu/job-submissionmanagement-transition-guide-jan-2014

Also, please make sure that you check the FAQ for common problems and their solutions by running the command on your headnode:  jan2014-faq  (use the spacebar to skip pages).

We had a hardware failure in the DDN storage system, which caused an interruption in the planned Biocluster data transfer. We expect to receive the replacement parts and fix the system in a few days. This failure has not caused any data loss, and the system will be up and running (perhaps with some performance degradation). We learned that the repairs will require a short downtime, and we will soon get in touch with the users of Gryphon, Biocluster and Skadi clusters (current users of this system), for scheduling this work.

Other accomplishments include:

– Optimus is now a shared cluster. All Optimus users now have access to optimusforce-6 and iw-shared-6.

– All of the Atlas nodes are upgraded to RHEL6.

– Most of the Athena nodes are upgraded to RHEL6.

– The old scheduler server (repace) is replaced with the upgraded (shared-sched). You may notice a difference in the generated job numbers and files.

– Some networking cable cleanup and improvements

– Gryphon has new scheduler and login servers, and the nodes used for these purposes have been put back in the computation pool.

– Deployed project file space quotas as previously agreed with PIs to users who did not have quotas prior to maintenance, and adjusted for those already over to allow some head room before abutting their quota. To check your quotas, use “quota -s”.

Power loss in Rich Datacenter

UPDATE: All clusters are up and ready for service.

At this time, all PACE-managed clusters are believed to be working.
You should be able to login to your clusters and submit and run jobs.

Any jobs that were running before the power outage have failed, so please resubmit them.

Please let us know immediately if anything is still broken.

PACE Team

What happened

At around 0810 Thursday morning, Rich lost its N6 feed, half of the feed powering the Rich building and the Rich chiller plant. This also caused multiple failures in the high voltage vault in the Rich back alley, so Rich also lost its other feed, N5. However, the N5 feed was still up in the chiller plant. Though the chillers still had power, as a precaution operators transferred cooling over to the campus loop. Rich office space was without power, but the machine rooms failed over to the generator and UPSes.

PACE systems were powered down gracefully to prevent a hard-shutdown that would make recovery more difficult.

Original Post

This morning (December 19), the Rich datacenter suffered a power loss.
We had to perform an emergency shutdown of all nodes.

As we receive new information we will update this blog and the pace-availability email list.

COMSOL 4.4 Installed

COMSOL 4.4 – Student and Research versions

COMSOL Multiphysics version 4.4 contains many new functions and additions to the COMSOL product suite. These Release Notes provide information regarding new functionality in existing products and an overview of new products.
See the COMSOL Release Notes for information on updates to this version of COMSOL.

Using the research version of COMSOL

#Load the research version of comsol 
$ module load comsol/4.4-research
$ comsol ...
#Use the matlab livelink
$ module load matlab/r2013b
$ comsol -mlroot ${MATLAB}

Using the classroom/student version of COMSOL

#Load the classroom/student version of comsol 
$ module load comsol/4.4
$ comsol ...
#Use the matlab livelink
$ module load matlab/r2013b
$ comsol -mlroot ${MATLAB}