Attention

This documentation is under active development, meaning that it can change over time as we refine it. Please email help@massive.org.au if you require assistance, or have suggestions to improve this documentation.

Starter Guide: GPUs on M3#

This documentation provides an introduction to choosing a GPU (graphics processing unit) on M3. In order to demystify the process of choosing and using a GPU, this documentation specifically covers how to choose a GPU and methods to access them, with Look-Up Tables giving more detail about which GPUs are available on M3.

GPUs on HPC (remote) vs. Laptop/Workstation (local)#

The GPUs available on a high performance computer (HPC) differ from those in your own laptop or desktop computer in ways that change your usage. For example, the NVIDIA GTX architecture in some computers maximises graphics performance; on a HPC, we want to maximise data processing and computational performance, and instead rely on data center GPUs. These include (from newest to oldest technology), the NVIDIA Ampere, Turing, Volta, Pascal, Maxwell, and Kepler architectures.

M3 has the Pascal P4, Volta V100, Turing T4, Ampere A40, and DGX Volta GPUs. For more detailed information on which GPUs are available, please look at our Look-Up Tables or About M3.

How do I choose a GPU?#

For a new HPC user who requires GPU access, begin on a desktop with a P4 . Desktops are designed to replicate the familiar environment of your local workstation but with the compute power of M3. When you need more computing power or want to submit jobs to M3 rather than running them interactively, it is advised to progress to our compute GPUs.

If you want more tailored advice for choosing a GPU, here are some questions you might ask yourself:

We provide look-up tables with brief comments on which GPU may be appropriate for your needs, but more detailed information is provided below.

Questions to ask yourself when choosing a GPU#

How will I access the GPU?#

You can access these GPU resources on M3 with a desktop, interactively with smux, or by submitting a job to the queue. Some GPUs are only available via certain access methods. More details on each of these methods is presented below.

Desktops#

When should I use this? You might want to use a desktop if you intend to visualise your work, use Jupyter notebooks, or are new to HPC.

More information: A desktop environment provides a visual interface (GUI) similar to your own computer for accessing M3, rather than using a command line interface. It is designed to be comfortable and allow you to run visual and interactive applications.

How do I access GPUs with a desktop? When starting a desktop session, you will be asked to select which GPU you want to use, and your desktop will start once the resources are available. The P4 GPUs are exclusively available when using a desktop session; T4 and A40 GPUs are also available via the command line. A P4 is usually the optimal choice of desktop due to high availability. More information about desktops is available at Connecting to M3 via the Strudel Desktop.

Interactively (smux)#

When should I use this? You might want to run a job interactively if you need to work on a compute node directly in real time. For example, if you wanted to use a V100 GPU to test commands before putting them in a job submission script. You will likely have to wait for access to a GPU with smux.

How do I access GPUs interactively? You’ll need to use smux on the command line and request a GPU as described in Running Interactive Jobs.

Job Submission (sbatch)#

When should I use this? To get access to the largest amount of compute time and resources you need to submit non-interactive jobs. You might want to submit a job when desktop GPUs aren’t powerful enough or you need more GPU RAM, and interactive jobs are preventing reproducibility or taking too much time.

More information: The benefit of submitting jobs versus running interactive jobs is that when the resources become available to you, the job will execute the commands you wrote immediately. If you have an interactive job, commands are only executed when you input them, leading to wasted time on your behalf waiting for the session to become available and wasted resources (compute time) that could be used by others in their research. It is important to use interactive jobs as the precursor to job submissions to get the benefits of both.

How do I access GPUs when I submit jobs? You’ll use the sbatch command to submit a job to the job scheduler (SLURM), and request the GPU resources you need as described in Running GPU Jobs.

Do I have access to the job scheduler queues (partition) for this GPU?#

When requesting a GPU interactively or via a job-submission script, you need to specify which partition (pool of resources) the GPU is on. If you don’t have access to a partition, you won’t be able to use that GPU. For example, you must apply to use the DGXs, and some GPUs are reserved for partnered groups.

What are my other job constraints (memory, CPU architecture)?#

If you request more GPUs than exist on the system, or more CPUs than are available to your GPUs, your job won’t run. When choosing a GPU, make sure you’re also able to access the other resources you need like memory and CPUs. It is important to check that your job is able to use the resources, not every program or framework can use multiple CPUs or GPUs. You can find detailed information about our hardware at About M3.

How do I balance the wait time for a GPU with its computational power?#

Different GPUs will attract different levels of demand, leading to different queue times. It’s worth considering how to balance GPU performance and time constraints. Compute GPUs like the V100s and A40s can have lengthy wait times because they are high performance and in high demand. If you’re testing out something small but still require good GPU performance, it might be more appropriate to use a P4 or T4 desktop with much smaller wait time. If you absolutely need a 32GB V100 to do your work, then it might be worth waiting for!

All of this information is summarised in GPU Look-up Tables for your reference.