Partitions Available#
Nodes belong to different partitions which allow corresponding jobs to run on them. The partitions available on M3 at the moment are:
- The default partition is
comp
which consists of: m3d: 6 nodes, each with 48 CPU cores, 790GB RAM
m3i: 43 nodes, each with 36 CPU cores, 171GB RAM
m3j: 11 nodes, each with 36 CPU cores, 357GB RAM
- The default partition for desktops is
desktop
: m3c: 13 nodes, each with 24 CPU cores, 4 x NVIDIA Tesla K80, 256GB RAM TO BE DECOMMISSIONED
m3p: 10 nodes, each with 36 CPU cores, 6 x NVIDIA Tesla P4, 330GB RAM
m3a: 4 nodes, each with 52 CPU cores, 4 x NVIDIA A40, 1TB RAM
m3t: 4 nodes, each with 52 CPU cores, 8 x NVIDIA T4, 900GB RAM
- Some of the recently added T4 and A40 NVIDIA GPUs are available on the new
gpu
partition: m3a: 4 nodes, each with 52 CPU cores, 4 x NVIDIA A40, 1TB RAM
m3t: 2 nodes, each with 52 CPU cores, 8 x NVIDIA T4, 900GB RAM
Other GPUs are accessible via their own partitions. For instructions on using GPUs on M3, see the GPUs on M3 page.
Other partitions#
- short:
Use this when the jobs can be completed within thirty minutes
3 nodes, each with 27 cores, 128GB RAM
- rtqp:
Intended to be used by jobs that have an instrument or a real-time scenario and therefore can’t be interrupted and must be available on demand
Batch jobs to support time at instrument are an example of appropriate use
Use this partition only with two QoS, rtq and irq
- A total of 7 nodes consisting of:
3 GPU compute nodes with 3 V100 GPUs per node and 36 CPU cores
4 CPU compute nodes with 36 CPU cores only
Checking the status of M3#
On M3, users can check the status of all nodes via the show_cluster
command.
The output of this command should be similar to:
$ show_cluster
NODE TYPE PARTITION* CPU Mem (MB) GPU/Phi STATUS
(Free) (Free) (Free)
---------------------------------------------------------------------------------------
m3c001 K80 desktop 0 0 0 Busy
m3c002 K80 desktop 0 0 0 Busy
m3c003 K80 desktop 0 0 0 Busy
m3c004 K80 desktop 0 0 0 Busy
m3c005 K80 desktop 0 0 0 Busy
m3c006 K80 desktop 0 0 0 Busy
m3c007 K80 desktop 0 0 0 Busy
m3c008 K80 desktop 0 0 0 Busy
m3c009 K80 OFFLINE REASON: Not responding Offline
m3c010 K80 desktop 0 64 0 Busy
m3c011 K80 desktop 0 0 0 Busy
m3c012 K80 desktop 0 0 0 Busy
m3c013 K80 desktop 0 0 0 Busy
m3c014 K80 desktop 0 0 0 Busy
m3d100 CPU comp 48 732 0 Idle
m3d101 CPU comp 48 732 0 Idle
m3d112 CPU comp 16 482 0 Running
m3d113 CPU comp 48 732 0 Idle
m3d114 CPU comp 48 732 0 Idle
m3d115 CPU comp 48 732 0 Idle
The STATUS field explained#
The STATUS
field can show:
Idle - Node is completely free. No jobs running on the node.
Running - Some jobs are running on the node but it still has available resources for new jobs.
Busy - Node is completely busy. There are no free resources on the node. No new jobs can start on this node.
Offline - Node is offline and unavailable due to a system issue.
Reserved - Node has been booked by other users and is ONLY available for them.