Remote data services

1 General information

The remotely accessible SLS and SwissFEL data services include an offline high performance computing facility (Ra cluster) for data analysis and a large volume data transfer facility. A general overview of these services is provided here:

https://www.psi.ch/photon-science-data-services/photon-science-data-services

The following sections describe how to set up a connection in general and give details about the specific procedures useful for TOMCAT users.

1.1 Requesting remote access

To use any of the remote data services, a personal external PSI account (e.g.: "ext-meier_f") is required. Please read the Getting Started Guide and follow the instruction on how to request such account, if you do not already have one.

1.2 Data analysis service

The purpose of this service is to offer to external users access to high performance computing resources for analysis of large data volumes acquired at PSI large scale facilities. Common software is available but dedicated analysis algorithms must be provided by the users.

1.2.1 Configuring remote access

To set up your remote connection to PSI, few initial steps are necessary. These include configurations to be performed at the PSI end by beamlime staff and installation of the necessary software on the end-user side. Please refer to the Remote Interactive Access Guide for instructions and follow all steps carefully. Make sure to test your connection before continuing to the TOMCAT-specific procedures below.

1.2.2 The Ra cluster

The general capabilities and configuration of the Ra data analysis cluster are described in more detail here, including instructions on how to connect to Ra and how to enable and use different software packages.

More details are available on the PSI intranet (only available with a connection from within PSI or through the remote access servers):
https://intranet.psi.ch/Computing/Offline-computing-facility-for-sls-and-swissfel-data-analysis

1.3 Data transfer

Several options are available to transfer data to and from PSI file systems. General instructions on how to set up and use the file transfer facility are described here:

https://www.psi.ch/photon-science-data-services/slsswissfel-data-transfer

2 Remote data services for TOMCAT users

This section describes the more TOMCAT-related aspects of remote data services that are not or only partially covered by the general manuals and guidelines mentioned above.

2.1 Data access and management

2.1.1 Access permissions to experimental data on and from Ra

Access to the experimental data for a specific experiment (more precisely, a specific eAccount) needs to be granted to each personal user account individually. Currently, while the Ra cluster is still in the testing phase, this access needs to be configured by one of the beamline staff. Let them know your personal account name (e.g.: "ext-meier_f"), and which eAccount or proposal number you need to access.

In the near future, the granting of access rights will be delegated to the experiment's PI. Access rights will then be configured through the DUO without requiring intervention from the beamline staff.

2.1.2 Data locations on Ra

In contrast to using the compute cluster at the beamline during the beamtime, the location of your data is a bit more confusing when using the Ra cluster. The Ra cluster provides direct access to your experimental data acquired during the beamtime ("online"). However, this is a read-only mount of the high-performance online beamline file system. Thus, this prevents you from accidentally deleting data, but also from modifying and adding directly to the original data sets. Instead, a separate file system ("offline") attached to the Ra cluster is used as a work area. In terms of running the reconstruction scripts or other analysis tools, this means that you are reading the raw data from one location, but saving results to another location.

You should be able to see your original experimental data under the following path (online file system):
/sls/X02DA/Data10/$EACCOUNT/
Where $EACCOUNT is the eAccount name (e.g.: e12345) used for data acquisition during the beamtime.

Any output from your analysis should go into the following location (or subfolders thereof):
/mnt/das-gpfs/work/$PGROUP
Here, $PGROUP represents the PI-Group associated with the proposal or beamtime (e.g.: p12345).

Some more in-depth information about the data locations and file systems can be found in the Data Section of the central documentation pages.

2.1.3 Transferring data from a remote location to Ra

Before transferring data back to the Ra cluster, please refer to the general info about data transfer (also mentioned above):

For transferring data from a remote location back onto the Ra cluster, the mount point for the work data file system (/mnt/das-gpfs/work/$PGROUP) mentioned above (e.g. for outgoing transfer) does not work. Instead, the following path needs to be used (pointing to the same location on the file system):
/das/work/$PGROUP
where $PGROUP is again the PI-Group associated with the beamtime (e.g.: p12345).

Use the following command to copy data back to the Ra cluster via rsync:
rsync -av PSI_DATA $USER@ra-export.psi.ch:/das/work/$PGROUP/.

2.2 Software tools

2.2.1 Available software

The following software packages relevant for tomography data analysis are currently available on the Ra cluster:
  • TOMCAT reconstruction pipeline: The pipeline is only available per command line so far. We are currently working on making the beamline GUI compatible with the Ra cluster. Refer to the Section Using the pipeline below.
  • Python (anaconda, python 2.7): See Configure Anaconda Python
  • Matlab (multiple versions): Use module load matlab/2016a at the command line to activate the 2016a version of Matlab:
    module load matlab/2016a
    matlab &
    
  • Compilers: Various common compilers
  • Fiji: Full Fiji installation including our custom DMP and hdf5-file plugins.
    module use unstable
    module load Fiji/201603
    fiji --system &
    
  • MeVisLab:
    module use unstable
    MeVisLab/2.7.1
    MeVisLab &
    
  • ParaView:
    module use unstable
    module add paraview 
    vglrun paraview 
    
    See https://intranet.psi.ch/Computing/DaasScientificApplications on the PSI intranet for more details on how to use and run ParaView.
  • Mathematica:
    module use unstable
    module add mathematica
    mathematica &
    
    See https://intranet.psi.ch/Computing/DaasScientificApplications on the PSI intranet for more details.
To see a full list of all available modules, type the following command in a terminal shell:
module use unstable
module avail


More general information about installed software can be found here: https://www.psi.ch/photon-science-data-services/offline-computing-facility-for-sls-and-swissfel-data-analysis#Software

If you require any other software packages for your data analysis that are currently not available on the cluster, please contact us. We will check with our IT support to see if the missing software can be installed/provided (subject to availability, compatibility, and licensing restrictions).

2.2.2 Configuring Python

All TOMCAT-specific scripts and tools written in python were built and tested in conda environments for Python 2.7. In order to use our tools, one first needs to configure access to the correct conda environment. Type the following commands in a shell to activate the TOMCAT pipeline environment:
export PATH=/opt/psi/Programming/psi-python27/2.3.0/bin:$PATH
export CONDARC=/afs/psi.ch/project/TOMCAT_pipeline/Anaconda/.condarc_py27ana230
source activate tomcatPipelineEnv 2>&1


2.2.3 Using the pipeline

To use the pipeline, one must first configure Python correctly (see above). Please contact us for further details.