Partitions Available#

Nodes belong to different partitions which allow corresponding jobs to run on them. The partitions available on M3 at the moment are:

The default partition is comp which consists of:
  • m3d: 6 nodes, each with 48 CPU cores, 790GB RAM

  • m3i: 43 nodes, each with 36 CPU cores, 171GB RAM

  • m3j: 11 nodes, each with 36 CPU cores, 357GB RAM

The default partition for desktops is desktop:
  • m3c: 13 nodes, each with 24 CPU cores, 4 x NVIDIA Tesla K80, 256GB RAM

  • m3f: 32 nodes, each with 3 CPU cores, 1 x NVIDIA Grid K1, 13GB RAM

  • m3p: 10 nodes, each with 36 CPU cores, 6 x NVIDIA Tesla P4, 330GB RAM

  • m3a: 4 nodes, each with 52 CPU cores, 4 x NVIDIA A40, 1TB RAM

  • m3t: 4 nodes, each with 52 CPU cores, 8 x NVIDIA T4, 900GB RAM

Some of the recently added T4 and A40 NVIDIA GPUs are available on the new gpu partition:
  • m3a: 4 nodes, each with 52 CPU cores, 4 x NVIDIA A40, 1TB RAM

  • m3t: 2 nodes, each with 52 CPU cores, 8 x NVIDIA T4, 900GB RAM

Other GPUs are accessible via their own partitions. For instructions on using GPUs on M3, see the GPUs on M3 page.

Other partitions#

  • short:
    • Use this when the jobs can be completed within thirty minutes

    • 3 nodes, each with 27 cores, 128GB RAM

  • rtqp:
    • Intended to be used by jobs that have an instrument or a real-time scenario and therefore can’t be interrupted and must be available on demand

    • Batch jobs to support time at instrument are an example of appropriate use

    • Use this partition only with two QoS, rtq and irq

    • A total of 7 nodes consisting of:
      • 3 GPU compute nodes with 3 V100 GPUs per node and 36 CPU cores

      • 4 CPU compute nodes with 36 CPU cores only

Checking the status of M3#

On M3, users can check the status of all nodes via the show_cluster command. The output of this command should be similar to:

$ show_cluster
 NODE            TYPE      PARTITION*         CPU     Mem (MB)   GPU/Phi         STATUS
                                           (Free)       (Free)    (Free)
 ---------------------------------------------------------------------------------------
 m3c001             K80           desktop         0         0         0           Busy
     m3c002             K80           desktop         0         0         0           Busy
     m3c003             K80           desktop         0         0         0           Busy
     m3c004             K80           desktop         0         0         0           Busy
     m3c005             K80           desktop         0         0         0           Busy
     m3c006             K80           desktop         0         0         0           Busy
     m3c007             K80           desktop         0         0         0           Busy
     m3c008             K80           desktop         0         0         0           Busy
     m3c009             K80 OFFLINE REASON:                   Not responding        Offline
     m3c010             K80           desktop         0        64         0           Busy
     m3c011             K80           desktop         0         0         0           Busy
     m3c012             K80           desktop         0         0         0           Busy
     m3c013             K80           desktop         0         0         0           Busy
     m3c014             K80           desktop         0         0         0           Busy
     m3d100             CPU              comp        48       732         0           Idle
     m3d101             CPU              comp        48       732         0           Idle
     m3d112             CPU              comp        16       482         0        Running
     m3d113             CPU              comp        48       732         0           Idle
     m3d114             CPU              comp        48       732         0           Idle
     m3d115             CPU              comp        48       732         0           Idle

The STATUS field explained#

The STATUS field can show:

  • Idle - Node is completely free. No jobs running on the node.

  • Running - Some jobs are running on the node but it still has available resources for new jobs.

  • Busy - Node is completely busy. There are no free resources on the node. No new jobs can start on this node.

  • Offline - Node is offline and unavailable due to a system issue.

  • Reserved - Node has been booked by other users and is ONLY available for them.