Running Array Jobs
These are our old docs! Please see our new docs by clicking on the M3 docs dropdown and selecting "New M3 docs".
These old docs were roughly converted from our old format. As a result, this copy is not identical to our previous docs. You can still find our old docs at https://old-docs.massive.org.au/.
You may notice some formatting and structural issues with these old docs. We will not resolve these, this is here purely for backwards compatibility to ensure old URLs do not die.
Job arrays allow you to run a group of identical/similar jobs. The Slurm
script is EXACTLY the same. The only difference between each sub-job is
the environment variable, $SLURM_ARRAY_TASK_ID
. So it can be a good
idea if you want to do some data level parallelization. E.g. let sub-job
1 (SLURM_ARRAY_TASK_ID=1)
process data chunk 1, sub-job 2 processes
data chunk 2, ... etc.
To do that, just add the following statement in your submission script,
where n
is the number of jobs in the array:
#SBATCH --array=1-n
An example of Slurm Array job script
#SBATCH --array=1-20
Or you can specify an array job at submission time, without modifying your submission script:
sbatch --array=1-20 job.script
In Slurm, the job array is implemented as a group of single jobs. E.g.
if you submit an array job with #SBATCH --array=1-4
. When the starting
job is ID=1000, the ids of all jobs are: 1000, 1001, 1002, 1003.
There is a limit of 1000 jobs per array. Slurm also has a bug where it
will not allow array id's above this limit, this can be worked around
with a prefix in the script (e.g. for "1001-1020" use --array=01-20
and reference variables with a prefix 10$SLURM_ARRAY_TASK_ID
)
A maximum number of simultaneously running tasks from the job array may
be specified using a %
separator. For example --array=0-15%4
will limit the number of simultaneously running tasks from this job
array to 4.