The MASSIVE logo

Welcome to the M3 user guide#

Important

M3 Scheduled Maintenance - /projects File System Expansion - Phase One

Please be advised that Massive M3 will be undergoing a three-day scheduled maintenance commencing Tuesday, the 10th of October, 2023.

During this scheduled maintenance, our engineers will be conducting the first of a two-phased plan to expand the capacity of the file system.

Specifically, the following work will be conducted on /projects (aka /fs04):

  • relocation of existing servers to a second rack;

  • installation of additional storage enclosures and hard disk drives; and

  • relevant cabling and networking work.

We will also be conducting other maintenance work on the system.

During the maintenance window, M3 will not be accessible and all nodes will be drained of running jobs prior to the start of the maintenance. Jobs already in the queue will remain pending until the restoration of service.

Important

Hardware Refresh Plan – Update: 5 Jul 2022

Please be advised of the following M3 hardware refresh schedule. These servers are now coming into end-of-life and will be retired.

While this will result in a reduction of total CPU and GPU capacity, retiring these servers is necessary to make room for new and faster compute nodes. This round of procurement for new hardware is expected to be provisioned to include new remote desktop servers to replace those being decommissioned. In the interim, more desktops will be made available from our existing pool of GPUs. Note that only the Strudel Beta list of desktops will be updated to reflect the change in desktop options.

Compute Nodes

Capability

To be retired by

m3f[000-031]

NVIDIA K1 GPU desktops

21 Jun 2022

m3c[000-013]

NVIDIA K80 GPU desktops

18 Jul 2022

m3e000

NVIDIA K80 GPU

m3h005

NVIDIA P100 GPU desktop

m3h[006-008]

NVIDIA P100 GPUs

Late 2022 Exact date TBC

We will be enabling the appropriate mechanisms (e.g., SLURM reservation) to ensure that these nodes will be idle of running jobs prior to their retirement. Please check your job scripts to ensure they do not specify these nodes using --nodelist.

Important

Planned maintenance outages

We have scheduled quarterly outages for the M3 cluster. This is to ensure we communicate scheduled outages to our HPC users in advance. Where possible, we perform rolling upgrades with the cluster online. However, sometimes we have to perform upgrades that require the cluster to be taken offline. These include:

  • system software upgrades;

  • network maintenance;

  • bug and security patches; and

  • hardware maintenance

This site contains the documentation for the MASSIVE HPC systems. M3 is the newest addition to the facility.

Using M3

Communities