Partitions Available

Nodes belong to different partitions which allow corresponding jobs to run on them. There are two partitions on m3 at the moment:

The default partition is comp which consists of:

  • m3a: 22 nodes, each with 24 cores plus 128GB RAM

  • m3d: 13 nodes, each with 24 cores, 256GB RAM

  • m3h: 14 nodes, each with 28 cores, 2 x nVidia Tesla P100, 256GB RAM

  • m3i: 52 nodes, each with 36 cores, 196GB RAM

  • m3j: 12 nodes, each with 36 cores, 384GB RAM

The default partition for desktops is desktop:

  • m3c: 13 nodes, each with 24 cores, 4 x nVidia Tesla K80, 256GB RAM

  • m3f: 32 nodes, each with 3 cores, 1 x nVidia Grid K1, 16GB RAM

  • m3h: 14 nodes, each with 28 cores, 2 x nVidia Tesla P100, 256GB RAM

  • m3p: 11 nodes, each with 36 cores, 6 x nVidia Tesla P4, 384GB RAM

Other partitions

  • short:
    • use this when the jobs can be completed within ten minutes

    • 3 nodes, each with 27 cores, 128GB RAM

  • rtqp:
    • intended to be used by jobs that have an instrument or a real-time scenario and therefore can’t be interrupted and must be available on demand

    • batch jobs to support time at instrument are an example of appropriate use

    • use this partition only with two QoS, rtq and irq

    • consists of Nvidia Tesla V100 and cpu nodes.

  • total of 11 nodes and total number of 348 cores

Checking the status of M3

On M3, users can check the status of all nodes via the show_cluster command. The output of this command should be similar to:

$ show_cluster
 NODE            TYPE      PARTITION*         CPU     Mem (MB)   GPU/Phi         STATUS
                                           (Free)       (Free)    (Free)
 ---------------------------------------------------------------------------------------
 m3a000           CPU         m3a/all        (24)     (116588)     (N/A)       Reserved
 m3a001           CPU         m3a/all        (24)     (116588)     (N/A)       Reserved
 m3a002           CPU         m3a/all        (24)     (116588)     (N/A)       Reserved
 m3a003           CPU         m3a/all          24       116588       N/A           Idle
 m3a004           CPU         m3a/all          24       116588       N/A           Idle
 m3a005           CPU         m3a/all          24       116588       N/A           Idle
 m3a006           CPU         m3a/all          24       116588       N/A           Idle
 m3a007           CPU         m3a/all          24       116588       N/A           Idle
 m3a008           CPU         m3a/all        (24)     (116588)     (N/A)       Reserved
 m3d007           CPU         m3d/all           8        15614       N/A        Running
 m3d008           CPU             m3d           8        15614       N/A        Running

The STATUS field explained

The STATUS field can show:

  • Idle - Node is completely free. No jobs running on the node.

  • Running - Some jobs are running on the node but it still has available resources for new jobs.

  • Busy - Node is completely busy. There are no free resources on the node. No new jobs can start on this node.

  • Offline - Node is offline and unavailable due to a system issue.

  • Reserved - Node has been booked by other users and is ONLY available for them.