Attention

This documentation is under active development, meaning that it can change over time as we refine it. Please email help@massive.org.au if you require assistance.

Copying files to and from M3#

To copy files to and from M3, see the appropriate section below.

GUI Tool - Windows, Mac OS, and Linux Users#

The M3 staff like to use FileZilla. FileZilla is available for Windows, Mac OS and Linux. Many SFTP clients are available, the list depends on the platform you use. Here are a few others:

The following instructions are based around FileZilla.

FileZilla#

Obtain FileZilla

  1. Navigate to https://filezilla-project.org/download.php?show_all=1 and click on Download for your operating system

    FileZilla main
  2. Once downloaded, install the application on your machine.

Note

On Windows the installer may also attempt to install WinZip and Opera. Make sure you deselect the checkbox if you don’t want these applications.

  1. Start up FileZilla.

Note

On Windows the look of the program will be different to the screen shots below but the location of buttons and fields is similar.

  1. In FileZilla click on the Site-Manager button.

    ../_images/FileZilla_site-manager.png
  2. Click on New Site and create a new site for M3.

    ../_images/FileZilla_new-site.png
  3. Complete the following fields and click on Connect.

    • Protocol: SFTP SSH File Transfer Protocol

    • Host: m3-dtn.massive.org.au

    • Logon Type: Ask for Password

    • User: Enter your M3 username

  4. Enter your password.

    ../_images/FileZilla_password.png
  5. FileZilla will connect to your home account on M3. Assuming everything went well you can now drag and drop files into M3.

WinSCP - Windows#

  1. Obtain WinSCP: https://winscp.net

  2. Once downloaded, install the application on your machine.

  3. Setup your connection to MASSIVE as shown here:

    • Protocol: SFTP

    • Host: m3-dtn.massive.org.au

    • Port number: 22

    • Username: Enter your M3 username

    • Password: Leave blank, you will be prompted for it.

    ../_images/WinSCP-site.png
  4. After connecting to MASSIVE, your screen will be similar to that below. In this example the folders MASSIVE_INPUT and MASSIVE_OUTPUT have been created locally on the Windows machine and remotely on MASSIVE.

    ../_images/WinSCP-connected.png

Now you can manually transfer files to and from MASSIVE using the interface. To automatically monitor folders for new files read below.

‘Watch’ a local folder and automatically SFTP files to MASSIVE

WinSCP can monitor the contents of a local folder and automatically SFTP new files across to MASSIVE. Note: this is ONE way, Windows to MASSIVE. For the reverse direction, see below.

  1. Click on ‘Commands’, ‘Keep Remote Directory up to Date…’

    ../_images/WinSCP-KeepRemoteUpToDate-menu.png
  2. Fill out the correct path for the source folder to watch and the destination folder

    ../_images/WinSCP-KeepRemoteUpToDate.png
  3. Click. ‘Start’. An initial synchronization of the folders will be completed and then when new files are added to the Windows folder, they will be SFTP’d to the MASSIVE folder.

    ../_images/WinSCP-Windows-to-MASSIVE.png

‘Watch’ a MASSIVE folder and automatically SFTP files to a local folder

To automatically bring files to Windows from MASSIVE, WinSCP executes a Powershell script.

  1. Click on ‘Commands’, ‘Static Custom Commands’, ‘Keep Local Directory up to Date…’

    ../_images/WinSCP-KeepLocalUpToDate-menu.png
  2. Setup the folders as required. In this example, MASSIVE_OUTPUT on MASSIVE and Windows is used. The synchronization interval can be set. This is how often WinSCP checks MASSIVE for new files and then SFTPs them to Windows.

    ../_images/WinSCP-KeepLocalUpToDate.png

3. Click ‘OK’. A Powershell window will then open. This process will continue until Ctrl+c is entered.

../_images/WinSCP-MASSIVE-to-Windows.png

TIP: To monitor folders on both systems using WinSCP, start first ‘Keep Local Directory up to Date’ and then start ‘Keep Remote Directory up to Date’.

Globus#

Globus can be used to transfer data at high speed in and out of MASSIVE. The Globus software is widely used to transfer data around the world between different endpoints. Endpoints are setup at a wide range of institutions and can also be created on your personal computer.

Attention

Update: Globus v5 has been installed on the MASSIVE data transfer nodes. The login process is the same, except the endpoint names have changed. The previous endpoints (massive#massive, massive#mern) will be decommissioned in the future. The main endpoint is now known as ‘MASSIVE M3’. For researchers connected to the Monash eResearch Network, massive#mern will be redeployed as ‘MASSIVE on MeRN’

Getting Started#

  1. Click on “Log In” to Globus: https://globus.org

    ../_images/globus-login.png
  2. Authenticate to Globus.

    There are a few options available.

    • if your organisation is in the list, select it and click continue (this works for Monash and others.)

    • use your Google account

    • use your ORCiD iD

    • if you have a Globus ID, click on ‘Globus ID to sign in’

    • use your Google account (this works for Monash)

    • use your ORCiD iD

    Choose the most appropriate option and follow the displayed pages.

    ../_images/globus-auth.png
  3. After authenticating to Globus the File Manager will be presented.

    • Click on the icon as shown to ensure two panels are displayed.

    ../_images/globus-file-manager.png
  4. The next step is to connect to the MASSIVE endpoint.

    • Click in the Collection text box. It does not matter which one, left or right.

    • Enter “MASSIVE M3” to search for the MASSIVE endpoint.

    • Select “MASSIVE M3” from the list.

    ../_images/globus-fm-two-panel.png
  5. Click ‘Continue’

    ../_images/globus-fm-massive.png
  6. You will be redirected to a page running at https://globusdev.massive.cloud.monash.edu/ This machine is running inside MASSIVE.

    • Use your MASSIVE username and password to ‘Sign In’.

    ../_images/globus-myproxy-oauth-massive.png
  7. After authenticating successfully, your MASSIVE home folder will be displayed.

    ../_images/globus-fm-massive-authenticated.png
  8. Now you need to add another endpoint to transfer data in and out of MASSIVE.

    • Using the other ‘Collection’ field, enter the name of the required endpoint. In the example shown below, ‘AARNet-Public-Test-Share’ has been used.

    ../_images/globus-fm-both-authenticated.png

Personal Globus Endpoint#

An Endpoint can be created on your personal computer. This allows you to transfer between your PC and any other endpoint, including MASSIVE.

To install, please choose the correct system and follow the Globus instructions:

Remembering the endpoint name setup during installation, search for your Endpoint in the Globus File Manager. The process is the same as steps 3 and 4 above.

For the Linux version, you may find the browsable directories are restricted for your Personal Globus Endpoint.

The file ~/.globusonline/lta/config-paths can be altered to fix this.

For full details please read this link: https://docs.globus.org/faq/globus-connect-endpoints/#how_do_i_configure_accessible_directories_on_globus_connect_personal_for_linux

Transferring Data#

Attention

If your datasets contains symbolic links (or symlinks) please be aware of how Globus behaves with regards to these. Read this carefully: https://docs.globus.org/faq/transfer-sharing/#how_does_globus_handle_symlinks

This section assumes two endpoints have been connected to File Manager. If not, please refer to Getting Started above.

  • In the below image, the right hand side is highlighted (dark blue menu), this means any actions will apply to files from the endpoint “AARNet-Public-Test-Share”.

  • possible file actions are highlighted in the middle of the screenshot.

    ../_images/globus-transfers.png
  • to apply file actions to the endpoint MASSIVE, click on the left.

    ../_images/globus-left.png

Transferring files from AARNet-Public-Test-Share to MASSIVE#

  1. Ensure the destination path is correct. e.g. ~/br76_scratch

  2. Click on the right hand side and select some files.

    ../_images/select-files.png
  3. Click on Start.

    ../_images/transfer-requested.png
  4. To view the transfer progress, click on ‘View Details’.

    ../_images/transfer-progress.png

To monitor all transfers click on ‘Activity’ on the left hand side.

Sharing Data with your Collaborators#

Note

By default, sharing via Globus is not enabled. To apply for access to share via Globus, log in to the HPC ID system and navigate to the Software section of your profile. Click on Add Software, locate GlobusAllow and apply for access. Once approved you will be able to create a Globus collection following the below link. You can also contact the MASSIVE Help Desk

Please refer to the Globus documentation. https://docs.globus.org/how-to/share-files/. Sharing using Globus on MASSIVE is enabled only for read-only access.

Additional help#

This is a brief introduction on how to use Globus to transfer data. For additional information, please refer to https://docs.globus.org/how-to/

Globus - Command Line Interface#

The Globus CLI software has been installed on MASSIVE.

module load globus-cli/2.0.0

globus
    Usage: globus [OPTIONS] COMMAND [ARGS]...

  Interact with Globus from the command line

  All `globus` subcommands support `--help` documentation.

  Use `globus login` to get started!

  The documentation is also online at https://docs.globus.org/cli/

Options:
  -v, --verbose                  Control level of output
  -h, --help                     Show this message and exit.
  -F, --format [unix|json|text]  Output format for stdout. Defaults to text
  --jmespath, --jq TEXT          A JMESPath expression to apply to json
                                 output. Takes precedence over any specified '
                                 --format' and forces the format to be json
                                 processed by this expression

  --map-http-status TEXT         Map HTTP statuses to any of these exit codes:
                                 0,1,50-99. e.g. "404=50,403=51"


Commands:
  bookmark        Manage endpoint bookmarks
  config          Manage your Globus config file. (Advanced Users)
  delete          Submit a delete task (asynchronous)
  endpoint        Manage Globus endpoint definitions
  get-identities  Lookup Globus Auth Identities
  list-commands   List all CLI Commands
  login           Log into Globus to get credentials for the Globus CLI
  logout          Logout of the Globus CLI
  ls              List endpoint directory contents
  mkdir           Create a directory on an endpoint
  rename          Rename a file or directory on an endpoint
  rm              Delete a single path; wait for it to complete
  session         Manage your CLI auth session
  task            Manage asynchronous tasks
  transfer        Submit a transfer task (asynchronous)
  update          Update the Globus CLI to its  latest version
  version         Show the version and exit
  whoami          Show the currently logged-in identity

The CLI allows users to access Globus data movement functionality via the command line. For more details please refer to: https://docs.globus.org/cli/

For Globus CLI examples: https://docs.globus.org/cli/examples/

This link contains a good list of examples. e.g. find an endpoint, file manipulation, single item transfers, batch transfers and deletions, share endpoints, safe resubmissions.

Globus - Platform as a Service#

https://www.globus.org/platform

The Globus Platform enables developers to utilise Globus functionality within their own research data applications and services.

Globus Juptyer Notebooks#

https://github.com/globus/globus-jupyter-notebooks

The notebook Platform_Introduction_Native_App_Auth.ipynb contains examples of using the functionality available.

Transfer_API_Exercises.ipynb has examples on transferring files, checking for errors, cancelling transfers under certain conditions, monitoring transfers.

Data Portals#

Sample research data portals can be viewed here: https://docs.globus.org/modern-research-data-portal/ These illustrate what can be achieved by using the Globus API.

For further information on the functionality available through the Globus API please refer to: https://docs.globus.org/api/

Command Line Interface - Linux and OS X Users#

rsync#

Use rsync to synchronise file systems and to transfer large amounts of files, with the ability to stop and restart the file transfers. rsync will replicate all files in a folder from one spot to another. It first analyses both file systems to find the difference and then transfers only the changes.

A typical command to synchronise files from a local folder to M3 is:

rsync -auv -e ssh adirectory username@m3-dtn.massive.org.au:~/destinationdirectory/

rsync is very powerful and has many options to help transfer data. For example it can delete unwanted files (--delete), compress data before transfer (-z) or can you let you see what command options might do without actually executing them (--dry-run). For more info on rsync try man rsync.