
Welcome to the M3 user guide#
Important
Data Fluency Training
We will be presenting two courses that might be of interest to new users. Please register at Data Fluency if you wish to attend.
Introduction to Unix Shell (22 and 23 November, 1/2 day each) https://monash.csod.com/ui/lms-learning-details/app/event/dba6ae06-5ab6-45d6-acff-b7d6b93bad81
Introduction to HPC (1 December, 1 day) https://monash.csod.com/ui/lms-learning-details/app/event/525327fc-8d08-4c6a-bebb-5dc056d19119
Important
M3 Scheduled Maintenance - /projects File System Expansion - Phase One
Please be advised that Massive M3 will be undergoing a three-day scheduled maintenance commencing Tuesday, the 10th of October, 2023.
During this scheduled maintenance, our engineers will be conducting the first of a two-phased plan to expand the capacity of the file system.
Specifically, the following work will be conducted on /projects (aka /fs04):
relocation of existing servers to a second rack;
installation of additional storage enclosures and hard disk drives; and
relevant cabling and networking work.
We will also be conducting other maintenance work on the system.
During the maintenance window, M3 will not be accessible and all nodes will be drained of running jobs prior to the start of the maintenance. Jobs already in the queue will remain pending until the restoration of service.
Important
Hardware Refresh Plan – Update: 5 Jul 2022
Please be advised of the following M3 hardware refresh schedule. These servers are now coming into end-of-life and will be retired.
While this will result in a reduction of total CPU and GPU capacity, retiring these servers is necessary to make room for new and faster compute nodes. This round of procurement for new hardware is expected to be provisioned to include new remote desktop servers to replace those being decommissioned. In the interim, more desktops will be made available from our existing pool of GPUs. Note that only the Strudel Beta list of desktops will be updated to reflect the change in desktop options.
Compute Nodes |
Capability |
To be retired by |
---|---|---|
|
NVIDIA K1 GPU desktops |
21 Jun 2022 |
|
NVIDIA K80 GPU desktops |
18 Jul 2022 |
|
NVIDIA K80 GPU |
|
|
NVIDIA P100 GPU desktop |
|
|
NVIDIA P100 GPUs |
Late 2022 Exact date TBC |
We will be enabling the appropriate mechanisms (e.g., SLURM reservation
)
to ensure that these nodes will be idle of running jobs prior to their
retirement. Please check your job scripts to ensure they do not specify these
nodes using --nodelist
.
Important
Planned maintenance outages
We have scheduled quarterly outages for the M3 cluster. This is to ensure we communicate scheduled outages to our HPC users in advance. Where possible, we perform rolling upgrades with the cluster online. However, sometimes we have to perform upgrades that require the cluster to be taken offline. These include:
system software upgrades;
network maintenance;
bug and security patches; and
hardware maintenance
This site contains the documentation for the MASSIVE HPC systems. M3 is the newest addition to the facility.
Help and Support
Using M3
- About M3
- Requesting an account
- Requesting help on M3
- Connecting to M3
- Connecting to M3 via ssh
- Connecting to M3 via the MASSIVE desktop
- Connecting to M3 via Strudel2
- Troubleshooting Common Issues Using Strudel
- I can login to Strudel and launch a desktop but cannot connect. I click the connect button and get an error saying “failed to connect to server”
- Strudel Web fails to connect to the desktop or Strudel2 crashes when trying to connect to desktop
- I can connect to a Desktop, but the display looks messed up, or I can see the display, but cannot interact with the Desktop!
- I’m using Strudel Desktop and get an error message “It looks like I was unable to contact the server for a list of sites to connect to…”
- File Systems on M3
- Copying files to and from M3
- Software on M3
- Running jobs on M3
- Partitions Available
- Other partitions
- Checking the status of M3
- Slurm Accounts
- Getting started with job submission scripts
- Running Simple Batch Jobs
- MPI on M3
- On CentOS 7
- On Rocky 9
- UCX_NET_DEVICES
- Running Multi-threading Jobs
- Running Interactive Jobs
- Running GPU Jobs
- Running Array Jobs
- QoS (Quality of Service)
- Features & Constraints
- Checking job status
- Project Allocation
- Lustre File System Quickstart Guide
- GPUs on M3
Communities
- MX2 Eiger
- Machine Learning
- Neuroimaging
- Cryo EM
- Bioinformatics
- The Genomics partition
- DGX
- Data Collections
- Machine learning
- ImageNet 2012 (ILSVRC2012)
- ImageNet 2015 Object Detection Data (ILSVRC2015 DET)
- International Skin Imaging Collaboration 2019 (ISIC 2019)
- NIH Chest X-ray Dataset (NIH CXR-14)
- Stanford Natural Language Inference (SNLI) Corpus
- COCO (Common Objects in Context) 2017
- AlphaFold
- AlphaFold v2 - AlphaFold-Multimer release
- Neuroimaging
- Human Connectome Project Dataset (HCP): HCP-1200
- Lifespan Human Connectome Project Development
- Lifespan Human Connectome Project Aging
- Baby Connectome Project
- Human Connectome Project for Early Psychosis
- Developing Human Connectome Project (dHCP)
- Brain Genomics Superstruct Project (GSP)
- Nathan Kline Institute Rockland Sample (NKI-RS): Neuroimaging Release
- Genomes
- Requesting a data collection
- Machine learning
- XNAT