Data Transfer - Experiment Data

This is a service to users that need to transfer (big) volumes of (experiment) data to their home institute / company / organisation. Data can be transferred by two methods, SSH (scp/rsync) and Globus.

PSI Account / MFA

Access to the data transfer services requires a PSI Account with MFA (multi factor authentication) enabled. If you don't have an account, please follow this procedure .  If your account isn't already enabled for MFA, please contact the IT Servicedesk during business hours via help@psi.chhttps://www.psi.ch/en/computing/change-to-mfa.

 

IMPORTANT: During the authentication process you will have to provide your username/password. After entering the password you will get a PUSH notification to the MFA application (your mobile need to have Internet access for it to work!). You have to check the app and authorize your login attempt within 60 seconds. Due to technical reasons we are not able to display you a message regarding this procedure on the login screen.

Authorisation

Make sure that you are authorised to use the data transfer service. You can request your authorisation by contacting your beamline manager or IT responsible (SwissFEL/Photonics group member only) with the following information

  • your PSI account name
  • data identifier (Proposal ID or e-account used to collect the data) for the data you need to access

The following directories are accessible with the data transfer service:

Directory Name Example Comments
/sls /sls/x10da/Data10/e15874 Raw data from SLS
/das /das/work/p15/p15874 Working area of the Ra cluster, please note a structure of subdirectories: p{AB}/p{ABCDEF}
/sf /sf/alvra/data/p17502/{raw,res,work} Raw data and working area for the data taken a SwissFEL facility
e.g. to copy data from another facility,like LCLS
/muse   Muse data

 

Please note that all filesystems are only accessible in read only mode. The only exception is /das/work !

Overview

SSH (scp / rsync) is a simple but powerful protocol to transfer data. SSH is available on all major operating systems and can, most of the time, be used out of the box. 

rsync usually needs to be installed separately. Please check with your IT or on the internet how to install it on your operating system.


Note: The following examples use ext-name, e15874 and MX as PSI account, e-account and beamline name - please replace them with your details.

In case of questions / problems please contact datatransfer@psi.ch

Prerequisites

In addition to the prerequisites listed above following prerequisites apply: 

  • SSH access to datatransfer.psi.ch from your machine/home organisation.
    • Please talk to your network/security team to allow this access in case your organisation is blocking outgoing SSH traffic!

Query / Retrieve Data

The endpoint/hostname name to use  for ssh/rsync data transfer is datatransfer.psi.ch

 

Data / files can be queried / listed as follows:

$ ssh <ext-name>@datatransfer.psi.ch "ls /sls/mx/Data10/e15874"

or a recursive listing (might be slow)

$ ssh <ext-name>@datatransfer.psi.ch "ls -R /sls/mx/Data10/e15874"

 

To transfer data - e.g. copy the directory data_exchange/processing_output to DESTINATION_DIRECTORY on your computer you can use:

  • rsync
$ rsync -av <ext-name>@datatransfer.psi.ch:/sls/mx/Data10/e15874/data_exchange/processing_output DESTINATION_DIRECTORY
  • scp
scp -r <ext-name>@datatransfer.psi.ch:/sls/mx/Data10/e15874/data_exchange/processing_output DESTINATION_DIRECTORY

Upload Data (RA Storage Only!)

Uploading data is only possible to the /das/work directory!

Your data needs to be transferred/uploaded into a/the disk space associated with your PI-Group on the offline storage since the original storage location on the online storage servers is not accessible with write-rights.

Use the following command to copy data back to the Ra cluster via rsync:

rsync -av PSI_DATA $EXT-ACCOUNT@datatransfer.psi.ch:/das/work/$PNN/$PGROUP/.

where $PGROUP represents the PI-Group (e.g.: p12345), and $PNN are the first three characters of the PI-Group (e.g.: p12).

So, for example:

rsync -av ~/PSI_data/disk1 ext-meier_f@datatransfer.psi.ch:/das/work/p12/p12345/.

Sometimes, you may experience file transfer issues (disk quota exceeded, permission denied) when using the above command. In this case, it can help to specifically tell the rsync command what the ownership of the transferred files should be. This is achieved by passing the --chown argument to rsync like so:

rsync -rtv --chown=$EXT-ACCOUNT:$PGROUP PSI_DATA $EXT-ACCOUNT@datatransfer.psi.ch:/das/work/$PNN/$PGROUP/.

Note also that we had to change the -a option (archive, which tries to preserve the original ownerships) to -rt (recursive + preserve times). For the above example, this would look as follows:

rsync -rtv --chown=ext-meier_f:p12345 ~/PSI_data/disk1 ext-meier_f@datatransfer.psi.ch:/das/work/p12/p12345/.

Connection Multiplexing

While using SSH multiplexing, one can efficiently re-use an established SSH connection, authenticated via  MFA, for the subsequent connections, without the need to authenticate (with MFA) again.

You can configure this on your SSH by adding following configuration in your local .ssh/config file (replace ext-name with your PSI account name):

$ cat .ssh/config Host datatransfer ControlMaster auto ControlPath ~/.ssh/sockets/%r@%h-%p ControlPersist 86400 hostname datatransfer.psi.ch User ext-name

Create .ssh/sockets directory (one time action)

$ mkdir ~/.ssh/sockets

 

Before transferring / querying data, you first have to initialise the "master" connection (authenticate with your password/MFA )

$ ssh datatransfer.psi.ch "ls /sls/mx" password:

 

In case the connection was successfully established all following ssh commands will be run via the main  connection, without additional authentication. You can now use any of the commands listed above to query and transfer your data like:

$ ssh datatransfer.psi.ch "ls /sls/mx" (no password)

To check the state of main connection:

$ ssh datatransfer.psi.ch -O check

To terminate main connection:

$ ssh datatransfer.psi.ch -O exit

The main connection lifetime is determined by the ConrolPersist value in your .ssh/config file. Please note that very long lived connections (more than few days) will be terminated on our side!

Troubleshooting

  • Public-key based authentication is not possible on datatransfer.psi.ch !
  • In very urgent cases - in case you have problems connecting to datatransfer.psi.ch - try to resort to one of the backup servers datatransfer-01.psi.ch and/or datatransfer-02.psi.ch . However do NOT use these hostnames in normal cases as we might stop these serves at any time!

Overview

Globus is a web service, which allows to transfer files in an easy and managed way. 

Globus has a number of build in features such as:

  • Automatic network optimisation
  • Parallel/multistream transfers (up to 4 transfers/streams)
  • Automatic retry in case of failure
  • Online task monitor
  • Summary email sent at the end of the transfer
  • Usually is firewall safe, as it uses only outgoing connections. If your firewall blocks also outgoing connections, then you need some special rules to be set up, contact your local IT support

In case of questions/problems please contact globus@psi.ch

Prerequisites

In addition to the prerequisites listed above following prerequisites apply:

  • A GlobusID account (free), or an account recognized by Globus (like Google, XSEDE, US Universities), see a very detailed description
    • PSI staff can use the PSI account to login to Globus (select the PSI organization)
    • PSI "ext-" accounts can't be used to login to Globus
  • Globus endpoint operated by your organisation or GlobusConnect Personal client installed (available for Win, Mac, Linux here)
  • To access PSI's data collection you have to authenticate with your username/password/MFA with our OIDC server. At the time of writing the OIDC server will only accept MFA verification if the user has Push notification via Microsoft Authenticator enabled! (OTP tokens will not work!)

Query / Transfer Data

PSI provides multiple collection from which data can be transferred. Which collection you are using depends at which PSI facility you collected your data.

To list all PSI data collections login to Globus, switch to the collections tab and search for "Paul Scherrer Institute"

 

To access a collection you need to authenticate against our OIDC server. This authentication requires that your account is MFA enabled (Important - Push Notification need to be your default setting for MFA !). After entering the password you will get a push notification on your Microsoft Authenticator app that you have to acknowledge in order to be able to log in. Afterwards you can access the collection, browse to your data and initiate the data transfer.

Globus Browse Collection


For more information on how to use Globus, please refer to the  Globus documentation