The MASSIVE logo

Welcome to the M3 user guide

Important

M3 Scheduled Maintenance: Network Uplift - 7-8 December 2021 - COMPLETED

The M3 scheduled maintenance has now completed.

We have lifted the reservation and pending jobs have started to run. You will be able to submit jobs or start Strudel Desktop sessions. The login and data transfer nodes are now available for access.

The following works were conducted during this maintenance:

  • Deployed RoCEv2 on the switches and servers in Monash-03 network fabric;

  • Configuration validated on all the login, data transfer and compute nodes;

  • Verification and validation of the configuration of all switches;

  • Performed extensive functional and benchmarking tests on the entire fabric to generate a baseline for the upcoming changes to the network;

  • Maintenance work completed on NFS servers that host the M3 software stack and home directories; and

  • Security patches applied on all the M3 nodes.

If you have any queries or concerns regarding this maintenance, please contact help@massive.org.au.

Attention

MASSIVE website issues

The main massive.org.au website is experiencing issues with an external hosting provider. We have temporarily redirected traffic to docs.massive.org.au while we are working on a long term fix.

Important

Hardware Refresh Plan 2021 – Update: 17 May 2021**

Please be advised of the following M3 hardware refresh schedule for 2021. These servers are now coming into end-of-life and will be retired this year.

While this will result in a reduction of total CPU capacity for 2021, retiring these servers is necessary to make room for new and faster compute nodes. The 2021 round of procurement for new hardware is expected to be provisioned during the middle of this year and this includes new remote desktop servers to replace those being decommissioned.

Compute Nodes

Capability

To be retired by the

m3a[000-021]

Intel Xeon-E5-2680-v3 CPU

end of May 2021

m3d[000-012]

m3f[000-031]

GRID K1 Desktop

end of August 2021

m3c[000-013]

Intel Xeon-E5-2680-v3 CPU

NVIDIA K80 GPUs

middle of November 2021

m3h[001,015] m3h[003-008] m3h[010,011]

Intel Xeon-E5-2680-v4 CPU

NVIDIA P100 GPUs

middle of November 2021

We will be enabling the appropriate mechanisms (e.g., SLURM reservation) to ensure that these nodes will be idle of running jobs prior to their retirement. Please check your job scripts to ensure they do not specify these nodes using --nodelist.

Important

Planned maintenance outages

We have scheduled quarterly outages for the M3 cluster. This is to ensure we communicate scheduled outages to our HPC users in advance. Where possible, we perform rolling upgrades with the cluster online. However, sometimes we have to perform upgrades that require the cluster to be taken offline. These include:

  • system software upgrades;

  • network maintenance;

  • bug and security patches; and

  • hardware maintenance

This site contains the documentation for the MASSIVE HPC systems. M3 is the newest addition to the facility.

Using M3

Communities