Enjoy competitive salaries and exceptional benefits by joining the rapidly growing PACE team at Georgia Tech.
Located in the heart of vibrant Atlanta, Georgia, the Partnership for an Advanced Computing Environment (PACE) team defines and manages centralized research computing services at the Georgia Institute of Technology. We facilitate research efforts and foster strategic partnerships to provide Georgia Tech researchers with an unrivaled advantage, empowering them to lead their disciplines for the advancement of science and society across the globe. In addition to high-performance computing, we incorporate emerging cloud, network, data analytics, and storage technologies. Our fast-growing team directly supports faculty, students, and other researchers resulting in a dynamic environment filled with intellectual stimulation and continuous innovation.
The PACE team is seeking a Scheduler/Workflow Orchestration Architect to design, build, integrate, and maintain scientific workflows in collaboration with Georgia Tech researchers and their teams, facilitating multi-faceted computational requirements from a wide variety of scientific domains. These workflows may include scheduling of jobs to run on conventional HPC resources and specialized architectures (e.g. GPUs and big data clusters); integration of scientific hubs, gateways and data pipelines; ingestion, dissemination and sharing of remotely and locally stored datasets including DMZ solutions; public, private, and hybrid clouds; and national-scale resources. This position will play a key role for the integration of local Georgia Tech resources into national computational grids, such as Open Science Grid (OSG), with a particular focus on maximizing their utilization.
Experience installing, configuring, optimizing, and troubleshooting one or more of the mainstream job schedulers (e.g. Torque/Moab, Slurm) is required. Deployment, testing, configuration and maintenance of one or more schedulers will constitute a portion of this positions responsibilities. This will include troubleshooting for job crashes, analysis of logs, and working directly with users for resolution of scheduling problems in a complex environment. Ever-changing computing requirements of Georgia Tech researchers also require evaluation of experimental solutions and additional modules in sandbox deployments, which may include interaction with vendors.
Another important aspect of this position will be the integration of science hubs and gateways in scientific pipelines with compute and data components, to collaborate with researchers on end-to-end solutions and method reproducibility. Examples include automated transfer, analysis and post-processing of large sets of data generated by simulations or instruments, some of which may be located remotely.
This position requires a series of technical skills including scheduler technologies, UNIX/Linux Operating system, scripting languages, configuration management tools, scientific computation, familiarity with science hubs and gateways, and understanding of data transfer technologies including Globus and GridFTP.
This position will be a strong influencer of strategic decisions via the collection and analysis of resource utilization data to identify insights on usage characteristics, computational trends, and metrics. These may include generation of automated usage reports and identification of recurring problems using basic data analytics tools available to the team.
- US Citizen or permanent resident
- 3 years of relevant work experience
- Experience installing, configuring, optimizing, and troubleshooting one or more of the mainstream job schedulers (e.g. Torque/Moab, Slurm)
- Excellent written and verbal communications skills
- Good command of UNIX/Linux and mainstream scripting languages
- Ability to work in a team environment
- Ability to interact and collaborate with faculty, researchers, graduate students, IT professionals, and vendors
- Ability to oversee complex and recurring cycles of design, evaluation, and troubleshooting of technical solutions
- Technical writing skills for documentation of complex solutions
- Excellent troubleshooting and methodical experimentation skills
- Strict compliance to cybersecurity policies and regulations to support workflows with expert controlled software and datasets
Additional knowledge, skills, and experience that are desirable, but NOT required:
- Experience with Globus and GridFTP solutions
- Experience with complex customization and development of scheduler components/modules
- Experience with scientific hubs, gateways, and/or ScienceDMZ
- Experience with data analysis tools such as Splunk, R or Python Pandas
- Experience with scientific and/or engineering code development
- Experience with Open Science Grid (OSG) and national-scale resources
- Complete the the application form, available here.
- Send the application form, cover letter, and your CV via one of the methods below to Paul Manno, OIT/ART/PACE
- Please reference position PVA38850
- email: firstname.lastname@example.org
- fax: (404) 385-9548
- postal mail (signature confirmation recommended): 258 4th Street NW, Rich Building, #324 Atlanta, GA 30332-0700
Please use this email address above to notify us for documents transmitted via fax or postal mail.