Scheduler/Workflow Orchestration Architect (Research Technologist)

To Apply:

(1) Fill the application form 

(2) Send the application form, cover letter and your CV to:

Paul Manno
PACE Team, ART/OIT, Georgia Tech
via fax: 
(404) 385 9548 (Attn: Paul Manno)

via mail (signature confirmation recommended): 
258 4th Street NW, Rich Building, #329 
Atlanta, GA  30332-0700

via email (not recommended for sensitive information): 
pmanno@gatech.edu

Please use this email to notify us for documents faxed or shipped via mail. 

 

Description: 

PACE (Partnership for an Advanced Computing Environment) enables discoveries by providing the necessary bridge between state-of-the-art cyberinfrastructure and world-class scholars, revealing a powerful combination of the two. We facilitate research efforts and foster strategic partnerships to provide Georgia Tech researchers with an unrivaled advantage, empowering them to lead their disciplines for the advancement of science and society across the globe.

The PACE team is seeking a “Scheduler/Workflow Orchestration Architect” to design, build, integrate, and maintain scientific workflows in collaboration with Georgia Tech researchers and their teams, facilitating multi-faceted computational requirements from a wide variety of scientific domains. These workflows may include scheduling of jobs to run on conventional HPC resources and specialized architectures (e.g. GPUs and big data clusters); integration of scientific hubs, gateways and data pipelines; ingestion, dissemination and sharing of remotely and locally stored datasets including DMZ solutions; public, private, and hybrid clouds; and national-scale resources. This position will play a key role for the integration of local Georgia Tech resources into national computational grids, such as Open Science Grid (OSG), with a particular focus on maximizing their utilization.

Experience installing, configuring, optimizing, and troubleshooting one or more of the mainstream job schedulers (e.g. Torque/Moab, Slurm) is required. Deployment, testing, configuration and maintenance of one or more schedulers will constitute a portion of this position’s responsibilities. This will include troubleshooting for job crashes, analysis of logs, and working directly with users for resolution of scheduling problems in a complex environment. Ever-changing computing requirements of Georgia Tech researchers also require evaluation of experimental solutions and additional modules in sandbox deployments, which may include interaction with vendors.

Another important aspect of this position will be the integration of science hubs and gateways in scientific pipelines with compute and data components, to collaborate with researchers on end-to-end solutions and method reproducibility. Examples include automated transfer, analysis and post-processing of large sets of data generated by simulations or instruments, some of which may be located remotely.

This position requires a series of technical skills including scheduler technologies, UNIX/Linux Operating system, scripting languages, configuration management tools, scientific computation, familiarity with science hubs and gateways, and understanding of data transfer technologies including Globus and GridFTP.

This position will be a strong influencer of strategic decisions via the collection and analysis of resource utilization data to identify insights on usage characteristics, computational trends, and metrics. These may include generation of automated usage reports and identification of recurring problems using basic data analytics tools available to the team.

 

Required qualifications:
  • US Citizen or permanent resident
  • 3 years of relevant work experience
  • Experience installing, configuring, optimizing, and troubleshooting one or more of the mainstream job schedulers (e.g. Torque/Moab, Slurm)
  • Excellent written and verbal communications skills
  • Good command of UNIX/Linux and mainstream scripting languages
  • Ability to work in a team environment
  • Ability to interact and collaborate with faculty, researchers, graduate students, IT professionals, and vendors
  • Ability to oversee complex and recurring cycles of design, evaluation, and troubleshooting of technical solutions
  • Technical writing skills for documentation of complex solutions
  • Excellent troubleshooting and methodical experimentation skills
  • Strict compliance to cybersecurity policies and regulations to support workflows with expert controlled software and datasets
 
Additional knowledge, skills, and experience that are desirable, but NOT required:
  • Experience with Globus and GridFTP solutions
  • Experience with complex customization and development of scheduler components/modules
  • Experience with scientific hubs, gateways, and/or ScienceDMZ
  • Experience with data analysis tools such as Splunk, R or Python Pandas
  • Experience with scientific and/or engineering code development
  • Experience with Open Science Grid (OSG) and national-scale resources