- You are here:
- GT Home
- Home
- Support
- The Job Scheduler

## Quickstart:

**A. Running jobs on multiple compute node****s**

- Download the multi-paralleljob.txt file attached and place in your account on the cluster. If you are using tcsh shell, please use multi-paralleljob-tcsh.txt file.
- Ceate a file called "jobs.txt" in the same directory as multi-paralleljob.txt
- Inside the jobs.txt file, enumerate all the commands that should be run. For example:

echo "cd <python file location>; pythontest.py" >> jobs.txt

echo "cd <python file location>; pythontest.py" >> jobs.txt

echo "cd <python file location>; pythontest.py" >> jobs.txt

echo "cd <python file location>; pythontest.py" >> jobs.txt

echo "cd <python file location>; pythontest.py" >> jobs.txt - Alternatively, we could do this with a for loop:

for NUM in `seq 1 128`

do

echo "cd <python file location>; pythontest.py" >> jobs.txt

done

- Inside the jobs.txt file, enumerate all the commands that should be run. For example:
- Change the nodes and ppn to desired number, and be sure nodes*ppn=length of jobs.txt
Make sure all environment variables such as PYTHONPATH is set in ~/.bashrc, and all needed modules are loaded in ~/.pacemodules- Submit job
- qsub multi-paralleljob.txt

**B. Running jobs on a single compute node**

- Download the paralleljob.txt file attached and place in your account on the cluster. If you are using tcsh shell, please use paralleljob-tcsh.txt file instead.
- Create a file called "jobs.txt" in the same directory as the paralleljob.txt file
- Inside the jobs.txt file, enumerate all of the commands that should be run. For example:

matlab -nodisplay -singleCompThread -r "program(test1,1)" matlab -nodisplay -singleCompThread -r "program(test1,2)" matlab -nodisplay -singleCompThread -r "program(test1,3)" ...

- We could also create the jobs.txt file using a script that contains the following::

for NUM in `seq 1 1000` do echo 'matlab -nodisplay -singleCompThread -r "program(test1,'$NUM')"' >> jobs.txt done

- Inside the jobs.txt file, enumerate all of the commands that should be run. For example:
- Change the desired number of cores assigned to the job in the paralleljob.txt file. For example:

#PBS -l nodes=1:ppn=8

**NOTE: The script will not (yet) work with more than 1 node. This line must always begin with "-l nodes=1****". "ppn" can be any number of cores that are available on a system (from 1 to 48).**

- Submit the job:

qsub paralleljob.txt

If this script is useful (or if you would like it to be useful but can't make it work), please send an email to pace-support@oit.gatech.edu.

## Motivation:

Many domains and problems require a large number of tests, each using a different set of parameters.

Below is an example of a matlab program that takes a single parameter. We want to execute "program" with each different parameter.

matlab -nodisplay -singleCompThread -r "program(test1,1)" matlab -nodisplay -singleCompThread -r "program(test1,2)" matlab -nodisplay -singleCompThread -r "program(test1,3)" matlab -nodisplay -singleCompThread -r "program(test1,4)" matlab -nodisplay -singleCompThread -r "program(test1,5)" matlab -nodisplay -singleCompThread -r "program(test1,6)" ... matlab -nodisplay -singleCompThread -r "program(test1,9999)" matlab -nodisplay -singleCompThread -r "program(test1,10000)"

If the "program()" function takes ten minutes to run, executing 10000 runs sequentially will take 70 days to execute. PACE-managed clusters have thousands of CPU cores available for execution. If we can use 10 processors concurrently, the time needed to complete the parameter sweep will be one week; if we use 100 processors, the runtime will be less than one day.

## How to run 10000 (or more) jobs effectively on the cluster

The best way to submit these thousand jobs is to use the PACE-developed script (paralleljob.txt at the bottom of the page) to run these 10000 jobs

To use this script to run the 10000 programs, we put every command that we want to run in the "jobs.txt" file:

matlab -nodisplay -singleCompThread -r "program(test1,1)" matlab -nodisplay -singleCompThread -r "program(test1,2)" matlab -nodisplay -singleCompThread -r "program(test1,3)" matlab -nodisplay -singleCompThread -r "program(test1,4)" matlab -nodisplay -singleCompThread -r "program(test1,5)" matlab -nodisplay -singleCompThread -r "program(test1,6)" ... matlab -nodisplay -singleCompThread -r "program(test1,9999)" matlab -nodisplay -singleCompThread -r "program(test1,10000)"

The paralleljob.txt script will take this "jobs.txt" file and use all of the assigned CPUs to execute these commands in parallel.

Example usage:

$ ls jobs.txt matlab_test1.m paralleljob.txt $ cat jobs.txt matlab -nodisplay -nojvm -singleCompThread -r "matlab_test1(1)" matlab -nodisplay -nojvm -singleCompThread -r "matlab_test1(2)" matlab -nodisplay -nojvm -singleCompThread -r "matlab_test1(3)" matlab -nodisplay -nojvm -singleCompThread -r "matlab_test1(4)" ... matlab -nodisplay -nojvm -singleCompThread -r "matlab_test1(17)" matlab -nodisplay -nojvm -singleCompThread -r "matlab_test1(18)" matlab -nodisplay -nojvm -singleCompThread -r "matlab_test1(19)" matlab -nodisplay -nojvm -singleCompThread -r "matlab_test1(20)" $ cat matlab_test1.m function [mysum] = matlab_test1(number) mysum=0; for i = 1:number mysum=mysum+i; end $ cat paralleljob.txt #PBS -N paralleljob #PBS -q force-6 #PBS -l walltime=30:00 #PBS -l nodes=1:ppn=10 #PBS -j oe #PBS -o paralleljob.$PBS_JOBID module load matlab/r2011b ... $ qsub paralleljob.txt 600121 $ ls jobs.txt paralleljob.600121 paralleljob.txt $ cat paralleljob.600121 --------------------------------------- Begin PBS Prologue Thu Nov 3 14:41:31 EDT 2015 Job ID: 600121 User ID: wemeneker3 Job name: paralleljob Queue: force End PBS Prologue Thu Nov 3 14:41:32 EDT 2015 --------------------------------------- < M A T L A B (R) > Copyright 1984-2011 The MathWorks, Inc. R2011b (7.13.0.564) 64-bit (glnxa64) August 13, 2011 To get started, type one of these: helpwin, helpdesk, or demo. For product information, visit www.mathworks.com. ans = 1 ... >> < M A T L A B (R) > Copyright 1984-2011 The MathWorks, Inc. R2011b (7.13.0.564) 64-bit (glnxa64) August 13, 2011 To get started, type one of these: helpwin, helpdesk, or demo. For product information, visit www.mathworks.com. ans = 5050 --------------------------------------- Begin PBS Epilogue Thu Nov 3 14:41:33 EDT 2015

Each of the "matlab" commands in the jobs.txt file was executed simultaneously by the paralleljob.txt script, but the output is ordered exactly the same as the commands in jobs.txt. The identical ordering is a feature of paralleljob.txt - any output generated by a command will be printed in the same order as the commands listed in jobs.txt.

### Splitting one jobs.txt file into multiple jobs

The paralleljob.txt script also allows users to subdivide the work among different jobs with the BATCHSIZE and BATCHNUM variables.

The example matlab parameter sweep has 10000 jobs to execute, to split this into ten different jobs we will execute these commands:

qsub -vBATCHSIZE=1000,BATCHNUM=0 paralleljob.txt qsub -vBATCHSIZE=1000,BATCHNUM=1 paralleljob.txt qsub -vBATCHSIZE=1000,BATCHNUM=2 paralleljob.txt qsub -vBATCHSIZE=1000,BATCHNUM=3 paralleljob.txt qsub -vBATCHSIZE=1000,BATCHNUM=4 paralleljob.txt qsub -vBATCHSIZE=1000,BATCHNUM=5 paralleljob.txt qsub -vBATCHSIZE=1000,BATCHNUM=6 paralleljob.txt qsub -vBATCHSIZE=1000,BATCHNUM=7 paralleljob.txt qsub -vBATCHSIZE=1000,BATCHNUM=8 paralleljob.txt qsub -vBATCHSIZE=1000,BATCHNUM=9 paralleljob.txt

The "BATCHSIZE=1000" tells the script to execute 1000 lines from the jobs.txt file, and the BATCHNUM=1 tells the script with set of 1000 to start with.

By default, BATCHSIZE is the number of jobs in jobs.txt.