The MASSIVE logo

Welcome to the M3 user guide#

Important

Open call to access NCI or DUG through the Monash HPC partnership program.

A call to access HPC resources at either NCI or DUG is now open. Please click here for full details.

Important

Presenting M3 on Rocky Linux

We are pleased to announce that M3 on Rocky Linux is now available for access. A number of nodes have been converted and are accessible via the new Rocky Linux login node:

m3-login3.massive.org.au

Please see our informational page for more details: https://docs.massive.org.au/FAQ/M3Rocks.html

Important

Rollout of a new operating system

A major security uplift of Massive M3 is currently underway.

The nine-year old CentOS operating system on M3 is approaching end-of-life (EOL) and will be out of support by the end of Q2 2024. Over the next few weeks, M3 will be progressively upgraded to run Rocky Linux, a newer and more secure operating system.

Our focus is on building new software packages for Rocky Linux. Actively-used applications on /usr/local are being tested on the new OS and will be reinstalled if they are incompatible. These applications along with future software requests will be built for Rocky Linux and installed at: /apps

This upgrade will be conducted in several phases so that you are able to continue running your analyses on M3 throughout. We will progressively upgrade existing CentOS M3 compute nodes to Rocky Linux by early June 2024.

The RockyLinux M3 will retain the use of:

  • SLURM for job scheduling; and

  • Environment modules for activating applications

Please visit this page for updates.

Upcoming information - stay tuned:

  • M3 Rocky Linux login and data transfer nodes;

  • How to submit jobs to Rocky compute nodes; and

  • How to request for new software for Rocky.

Important

Cybersecurity Alert - 16 April 2024

A critical security vulnerability has been discovered with previous versions of:

  • putty

  • filezilla

  • winscp

  • tortoisegit

we strongly advise you to update to the latest versions.

Source: https://thehackernews.com/2024/04/widely-used-putty-ssh-client-found.html

Important

**Upcoming Maintenance - 12-14 March 2024 **

Please be advised that Massive M3 will be undergoing a three-day scheduled maintenance starting Tuesday the 12th of March 2024.

Access to Massive M3 will not be available throughout this maintenance, as we will be conducting the following essential works:

  • upgrade the servers and clients for the Lustre /projects and /scratch file systems;

  • upgrade the system software on the M3 network switches; and

  • perform other critical updates.

For any concerns and enquiries, please contact our helpdesk at: help@massive.org.au

Important

Updates to Network Switches

Over the next few weeks (from 31 January 2024), we will be updating the network switches that underpin MASSIVE M3. Each switch takes a short time (i.e., several minutes) to update, but during this update, jobs that are sensitive to the network may be impacted. We will be placing batches of compute nodes on SLURM reservations to ensure that they will be drained of jobs before the updates.

The schedule of work and any potential impact to the service will be provided below.

[Update: Feb 09 2024]

We have scheduled the first lot of compute nodes for this update on the 13th of February 2024. Work should not take a couple of hours to complete.

[Update: Feb 16 2024]

Another 28 switches need to be updated and we will be scheduling a downtime for this. This downtime will also be used to perform necessary updates to the Lustre File system, along with other updates.

Important

Data Fluency Training - Note the date changes

We will be presenting two courses that might be of interest to new users. Please register your email at Data Fluency to be sent details on the courses.

Please see https://www.monash.edu/data-fluency/events for the link to the event and other events.

  • Introduction to Unix Shell (7th and 8th March 2024, 1/2 day each). To enroll please click at Data Fluency

  • Introduction to HPC (15th March 2024, 1 day)

Important

Hardware Refresh Plan – Update: 5 Jul 2022

Please be advised of the following M3 hardware refresh schedule. These servers are now coming into end-of-life and will be retired.

While this will result in a reduction of total CPU and GPU capacity, retiring these servers is necessary to make room for new and faster compute nodes. This round of procurement for new hardware is expected to be provisioned to include new remote desktop servers to replace those being decommissioned. In the interim, more desktops will be made available from our existing pool of GPUs. Note that only the Strudel Beta list of desktops will be updated to reflect the change in desktop options.

Compute Nodes

Capability

To be retired by

m3f[000-031]

NVIDIA K1 GPU desktops

21 Jun 2022

m3c[000-013]

NVIDIA K80 GPU desktops

18 Jul 2022

m3e000

NVIDIA K80 GPU

m3h005

NVIDIA P100 GPU desktop

m3h[006-008]

NVIDIA P100 GPUs

Late 2022 Exact date TBC

We will be enabling the appropriate mechanisms (e.g., SLURM reservation) to ensure that these nodes will be idle of running jobs prior to their retirement. Please check your job scripts to ensure they do not specify these nodes using --nodelist.

Important

Planned maintenance outages

We have scheduled quarterly outages for the M3 cluster. This is to ensure we communicate scheduled outages to our HPC users in advance. Where possible, we perform rolling upgrades with the cluster online. However, sometimes we have to perform upgrades that require the cluster to be taken offline. These include:

  • system software upgrades;

  • network maintenance;

  • bug and security patches; and

  • hardware maintenance

This site contains the documentation for the MASSIVE HPC systems. M3 is the newest addition to the facility.

Using M3

Communities