Policies and guidelines governing PACE are determined by the Faculty Governance Committee based upon input from the faculty participants and senior technical support staff. Policy development is still in its early stages, so we expect further development and details over time. The following are currently in place:
Every 12-18 months, a new vendor qualification process will select another vendor with which we will negotiate detailed configuration and pricing. This process usually takes a few months, and may (or may not) select a different vendor.
Major hardware acquisitions occur at once or twice a quarter. Smaller acquisitions will occur as needed.
Once a purchase order is issued, there is usually a 4-month period of time before production readiness of new clusters. This time period covers manufacture, delivery, installation, integration, rigorous acceptance testing, production setup and cut-over.
Vendor Qualification schedule
The estimated timeline for determining a purchase is detailed below, and is subject to change without notice. GT reserves the right to not issue a purchase order as a result of this process. All procurements will proceed within the context of existing contracts, purchasing rules and guidelines, and applicable state law.
This process will begin with a set of documents released to vendors. These documents will provide general information regarding the configuration of a proposed acquisition and acceptance criteria for said acquisition. It is unlikely that GT will make a purchase that matches this description exactly. Vendors will be asked to respond with a quote for this sample configuration. GT will then use the provided information to select preferred vendor(s) and enter detailed discussions regarding configuration and pricing. Vendors should be prepared to ship samples of proposed products to GT for evaluation at this point. Shortly after initiating this process, GT will hold some number of Q&A sessions regarding IT equipment, data center concerns and other topics as needed. Vendors are expected to attend these meetings and may submit questions to GT in advance. All questions and their answers will be provided to all vendors during these Q&A sessions.
Like any other computer or server, HPC resources go through several stages from specification to retirement. It is expected that faculty will be involved throughout the process, however, in order to gain some advantages of scale and establish good working relationships with vendors, PACE staff will play a large role in the specification, procurement and provisioning of HPC resources. In particular, they can combine several orders to leverage faculty discounts. A single vendor, may be contracted to meet the requirements of several cluster orders, since this creates a more sustainable environment form the perspective of maintenance and support.
Software is a critical component of the clusters. The primary software stack will be based on RedHat Enterprise Linux and other widely available open source tools. Available software and versions can be listed on systems using the "module avail" command. All software on PACE-administered clusters must be installed and used in compliance with Georgia Tech's Computer and Network Usage and Data Access Policies. Software installation policies can be found in below sections.
Hardware will generally have a minimum of 3 years of vendor support. Even as maintenance contracts lapse, PACE staff will make every effort to maintain computing hardware provided it works, is useful to researchers and students, and it is not preventing new equipment from entering the space allocated to HPC.
There are regularly scheduled maintenance windows scheduled four times annually. The degree of impact depends on the type of maintenance being performed. All efforts are made to limit downtime during prescribed maintenance periods. All users will be notified via email well in advance of upcoming maintenance events. The latest approved schedule follows:
May 11, 12, & 13, 2017 (Thu, Fri, Sat)
Aug 10, 11, & 12, 2017 (Thu, Fri, Sat)
Nov 2, 3, & 4, 2017 (Thu, Fri, Sat)
Feb 8, 9, & 10, 2018 (Thu, Fri, Sat)
May 10, 11, & 12, 2018 (Thu, Fri, Sat)
Aug 10, 11, & 12 2018 (Thu, Fri, Sat)
Nov 1, 2, & 3 2018 (Thu, Fri, Sat)
The next maintenance day will be determined in coordination with CODA building moving efforts, not to take place before 2/15/2019.
Jobs must be run using provided workload schedulers. We currently use Moab/Torque. This allows users to gain experience with standard HPC environments which are used in other places, including national labs and the XSEDE. Exclusive access nodes will have at least one single group queue. The FoRCE has several queues for various purposes.
Headnode (login) nodes:
Headnodes (a.k.a login nodes) are shared by multiple users and must be used only for submitting jobs and other light weight activities. It is against policy to run any operation on the headnodes that will impact their responsiveness. These include user processes that consume 90% or more of a CPU, 50% of the available memory, and/or large data operations or transfers (including SFTP connections) that have visible adverse impact on the responsiveness of the headnodes. Processes violating this policy will be killed by the PACE team after 30min of run. PACE team will show best effort to contact the users before killing processes, but reserves the right to kill any processes in violation without notification.
PACE supports servers that allow for fast data operations, which are referred to as "datamover" nodes. As of this date, the only datamover node is "iw-dm-4.pace.gatech.edu". These nodes should strictly be used for data operations (e.g. scp, sftp, rsync), which may include tar/untar and data compression when needed. We do our best effort to contact the users before killing processes but we reserve the right to kill any processes in violation (e.g. matlab, python, or other software) without notification.
The scratch storage is cleaned approximately weekly. Any file older than 60 days will be removed. An email notification of removal will be sent one week prior to the file’s removal. Remember, the scratch storage is not intended for long term storage of data sets. As such, we do not maintain backups of this storage. We have instituted a 5 TB soft quota and 7 TB hard quota on the scratch space. Users will be allowed to write more than 5 TB to the scratch space, but will receive warning emails when doing so. The system enforces a 7 TB limit on scratch space utilization, and data writes exceeding this limit are blocked. Users are also limited with a total of 1 Million files or directories, regardless of their size.
Project and Home Directory Storage:
Both home directories and project directories are subject to quotas. Quotas limit the amount of data that can be stored. Home directories are subject to a 5 GB quota - meaning that at most 5 GB of data can be stored in the home directory.
Project storage (the "data" directory) is subject to quotas based on requests from an Authorized Requester, namely a cluster "owner" or someone designated by a cluster owner (such as a PI). Authorized requests must be sent to email@example.com. Similar to scratch, project directories kept on the GPFS filesystem are subject to 2 Million files or directories, regardless of their size, per person.
Home and project space are backed up daily. Backed up files will be retained for 30 days. Please contact firstname.lastname@example.org to restore deleted files.
FoRCE Resource Allocation:
Users who are granted access to FoRCE by the Faculty Governance Committee receive a no-cost basic allocation of up to 100,000 CPU hours to be utilized over the course of one year. Brief proposals for larger amounts of time are accepted and will be approved by the Faculty Governance Committee based on scientific merit and resource availability. All proposals must be submitted by a member of the GT faculty who may designate a number of users to share their allocation. Please see Participation for further information. Faculty who receive this basic allocation may be subject to other resource restrictions as recommended by the Faculty Governance Committee.
Software Installation Policies:
The appropriateness of the software is subject to a security analysis, licensing restrictions, compatibility with existing system, and necessity for research. Software will only be installed if it is determined that it is necessary for research and meets the security and licensing requirements. Evaluation of and decision for each new software request may take days to weeks, and the installation may take more than several weeks depending on the complexity. All software installation requests are evaluated according to the following criteria:
Should benefit multiple research groups
Supported by the RHEL version being used by PACE (Linux support is not sufficient)
Does not require emulation (dosbox, wine, etc)
Doesn't require deployment of database servers (clients querying external DBs are OK)
Doesn't require any system-level changes (e.g. changes to /etc, /var, etc)
Must be actively supported and publicly available (no experimental research codes, modified versions or abandoned projects)
No sequential GUI applications that would be a better fit for local workstations (e.g. openGL software that will not work on all nodes)
Should support a standard installation framework such as makefiles, autoconf/automake, cmake, setup.py, conda etc. PACE may refuse compilations using non-standard installation procedures requiring manual changes to several configuration files.
Software that requires PACE accepting license terms on behalf of users cannot be included in the global repository, even if the software is free for academic use. Exceptions can be made per written approval from the developer/company.
Upgrades to existing software will be subject to a review process to determine if the new version is critical for continuation of research.
Compiler optimizations that are specific to certain architectures cannot be applied to the global repository if they cause crashes on the remaining architectures that PACE supports.
Installing software that is restricted to certain users or groups of users
Requests to access restricted software must be made or approved by the cluster owner or by a person designated by the owner as having the quthority to make changes (known as an Authorized Requestor). An authorized requester may be anyone designated by the cluster owner including department CSRs, students, post-docs, or other faculty.
Requests must be sent to email@example.com.
Depending on the procedure required for proper authorization, the access request may take days or weeks before access is granted.
Restricted-access software includes VASP, Gaussian, StarCCM+, Materials Studio, and Lumerical FDTD.
Installing pre-compiled software
Updated versions of pre-compiled (like Matlab, Ansys, Abaqus, etc.) will be installed upon request.
New versions of pre-compiled software must be evaluated in terms of licensing and installation procedures before installation.
Installing software that must be compiled
The source code of software is often provided. When possible, PACE will compile and install software from the source code using GNU and Intel compilers. As a general rule, the default GCC compiler and the latest stable compiler and dependent library versions (MPI, hdf5, fftw3, hwloc, etc) will be used to install new software. For example, if GCC version 4.9.0 is installed, future software compilations will use gcc/4.9.0 instead of gcc/4.7.2. Software that was installed before gcc/4.9.0 was setup will not be reinstalled except upon request. Any program will be installed with the latest compiler version upon request.
In the event that the software will not run correctly with a particular compiler, that compiler will be skipped. For example, version 1.2.3 of software "frozznable" is known not to work with the gcc/4.9.0 and intel/15.0 compilers. Because of those known failures, frozznable version 1.2.3 will only be compiled with gcc/4.7.2, and intel/14.0.2 that are known to work. Older versions of gcc and intel will not be used to compile frozznable-1.2.3.
PACE will use an alternate compiler (e.g. PGI) to install software if required (for portability and performance reasons). To request that software be installed with another compiler, file a support request ticket by sending an email to firstname.lastname@example.org. PACE will evaluate the request on a case-by-case basis and determine if the alternate installation should be supported.
If an alternate installation request is rejected, the rejection may be appealed to the faculty governance committee. Please send appeal requests to email@example.com.
Updating libraries or MPI versions
Similar to the "Installing software that must be compiled" section above, if application ABC depends on a library, MPI, or other application, ABC will be installed with the latest compilers and any library, MPI, or related application will be recompiled with matching compiler versions.
Installation of data used by applications
Any corpus or database used by an application should be installed in the user's personal directories. If a collection of data needs to be installed, PACE will examine the requirements on a case-by-case basis and determine if an exception should be made.
New User Policies:
PACE clusters are available to all GT Faculty, staff, students, and external collaborators who have been granted access by an Authorized Requester, namely a cluster "owner" or someone designated by a cluster owner. Authorized requests must be sent to firstname.lastname@example.org. In the case of account requests for the FoRCE, Hadoop and GPU clusters (as well as all future clusters that will be governed by the Faculty Governance Committee), a faculty member whose request has been approved is granted a block of time that can be shared by users identified in the submission, using the link: http://pace.gatech.edu/node/add/request
PACE clusters are a campus resource subject to the Institute's Data Access Policy http://www.policylibrary.gatech.edu/data-access , CNUSP http://www.policylibrary.gatech.edu/computer-and-network-usage-and-security , and Passwords Policy http://www.policylibrary.gatech.edu/passwords .
No Category III or IV data can be processed/stored on a PACE-managed resource.
Continued Access to PACE After GT Affiliation Status Changes
Users can continue to access PACE after their status have changed (e.g. graduation, new research group, end of collaboration, etc) if approved by their PI and they have an active GT account that allows them to connect to VPN. Users who no longer have access to VPN can be provided with continued access to PACE after the following steps:
1. Find a sponsor with active GT credentials, who can open them a guest account from passport.gatech.edu
2. Send email@example.com a request, including the old and new (guest) usernames. If they need existing data migrated to the new account, this should be requested specifically in the ticket.
PACE will then create a new account using the guest credentials and migrate the files if requested. Ideally, the user will not see any differences in the queues or files, other than the changed username.
The GT sponsor hosting the guest account(s) assumes all responsibility for the users and their conformance to the GT and PACE usage policies. These accounts can be extended as needed by the GT sponsor from passport.gatech.edu