In order to run a job with the scheduler, you must submit your job to an appropriate queue. This will allow the scheduler to find and allocate the most appropriate nodes in order to run your job as quickly as possible.
There are two ways to submit jobs to the scheduler: msub and qsub (preferred). Both commands will work and both use the same format command line and batch submission file format. We recommend using the qsub command which is known to be faster and more robust.
Job submission is a 2-step process. For batch submissions, users must prepare a PBS script, which passes all the required information regarding a task to the scheduler so the scheduler can allocate proper nodes for it. For interactive submissions, users can pass job requirements to the scheduler directly, i.e., without using a script. See next section for more details on interactive job submission. The PBS script can be regarded as a list of user 'demands', such as, which queue the task will be assigned to, how many nodes/cores will be required, how much memory is needed, etc. The second step is to submit this script to the scheduler using qsub:
The file helloexample.pbs (name and extension are arbitrary, although using .pbs or .sh is the convention) is the PBS script, as annotated below:
# This is an example PBS script #PBS -N hello #PBS -l nodes=7:ppn=4 #PBS -l mem=2gb #PBS -l walltime=15:00:00 #PBS -q paceib #PBS -k oe #PBS -m abe #PBS -M email@example.com cd $PBS_O_WORKDIR echo "Started on `/bin/hostname`" echo "Nodes chosen are:" cat $PBS_NODEFILE module load gcc/4.9.0 module load mvapich2/2.1 mpirun -v -np 28 -machinefile $PBS_NODEFILE ~/mpi/hello
Lines beginning with '#PBS' indicate instructions to the scheduler. Unlike lines beginning with '#' (first line), they are not 'comments', instead they are 'commands', hence they should NOT be discarded. All other lines are passed to the user's default shell (usually bash) for execution. For complete information on options to both msub and qsub, please see the online manual page for qsub.
The following common options are explained here:
Gives the job the name "hello". This name will be used to prefix all job output files (both standard output and errors) in your home directory. The name also appears in the queue listing once submitted.
Tells the scheduler that you wish your job to run on 7 nodes, each with 4 processors per node (same thing as 'cores'). Please note at present, this is advice to the scheduler. The scheduler may choose to run your job with 28 processors (7 nodes, 4 processors each) based upon any other combination of other available nodes and processors.
Tells the scheduler that this code may use up to a total of 2GB memory. To specify memory per core, use '-l pmem' instead.
Tells the scheduler you expect this job to require no more than 15 hours of wall clock time to run once started. This provides a good mechanism for detecting jobs that may encounter an infinite loop or some unexpected behavior. Once this time limit is reached, the scheduler may cause the job to fail and release its resources. The format is HH:MM:SS or, if expressed as a single integer, just seconds.
Tells the scheduler you wish this job to run in the paceib queue using Infiniband-connected hosts. Available PACE queues are listed here. You can get a list of queues that you have access by running the "pace-whoami" command anywhere on the cluster.
Tells the scheduler you wish to retain both standard output and error output of the job and to place this output in your home directory named for the job name with the suffix .o
Tells the scheduler to send you email based upon:
a mail is sent when the job is aborted by the batch system.
b mail is sent when the job begins execution.
e mail is sent when the job terminates.
-M firstname.lastname@example.org (optional)
Allows you to use an alternative email. The scheduler will use your default email if -M is not defined.
Unless otherwise noted, all queues have the following defaults:
- default memory is 1GB (this is used if you do not provide a "#PBS -l mem=" hint to the scheduler)
- default processors is 1 CPU (this is used if you do not provide a "#PBS -l nodes=" hint to the scheduler)
- default wall clock time is 1 hour (this is used if you do not provide a "#PBS -l walltime=" hint to the scheduler)
- maximum wall clock time is 60 days unless otherwise configured on a per-queue basis
- scheduling priority is configured per-queue basis
- a higher value for priority means that jobs will tend to be scheduled sooner than jobs with lower values for priority
- as jobs hang out in the queue, waiting for available processors their priority will increase
- large multi-processor jobs may cause "holes" of available processors, which may be filled with smaller jobs. This may cause a lower priority job to be scheduled before a higher priority job simply because it would fit. In general, the scheduler will try to keep CPUs busy rather than preserve a strict "this job runs before that job" ordering.
The remainder of the file is used as shell script commands to run the program. This example will:
cd $PBS_O_WORKDIR Change to the directory where pbs script sits.
echo "Started on `/bin/hostname`" shows where the job begins it's execution.
echo "Nodes chosen are:" These two lines show the nodes selected by the scheduler
The following command actually begins the execution of your MPI-based application:
mpirun -v -np 28 -machinefile $PBS_NODEFILE ./hello
This last and most important step for MPI-based applications requests the execution for ~/mpi/hello application be started on 28 processors, with a list of machines chosen by the scheduler. Note that the number of processors passed to mpirun (using -np) should be equal or be smaller than nodes*ppn.
The job not necessarily needs to be MPI. Many other parallel packages/applications (matlab, R, comsol, etc.), and even sequential jobs can be submitted using a PBS script.
With luck, your job will be submitted, run to completion and return all results desired.