# Remote data services

The remotely accessible SLS and SwissFEL data services include an offline high performance computing facility (Ra cluster) for data analysis and a large volume data transfer facility. A general overview of these services is provided here:

https://www.psi.ch/photon-science-data-services/photon-science-data-services

The following sections describe how to set up a connection in general and give details about the specific procedures useful for TOMCAT users.

### Requesting remote access

To use any of the remote data services, a personal external PSI account (e.g.: "ext-meier_f") is required. Please read the Getting Started Guide and follow the instruction on how to request such account, if you do not already have one.

### Data analysis service

The purpose of this service is to offer to external users access to high performance computing resources for analysis of large data volumes acquired at PSI large scale facilities. Common software is available but dedicated analysis algorithms must be provided by the users.

#### Configuring remote access

To set up your remote connection to PSI, few initial steps are necessary. These include configurations to be performed at the PSI end by beamlime staff and installation of the necessary software on the end-user side. Please refer to the Remote Interactive Access Guide for instructions and follow all steps carefully. Make sure to test your connection before continuing to the TOMCAT-specific procedures below.

#### The Ra cluster

The general capabilities and configuration of the Ra data analysis cluster are described in more detail here, including instructions on how to connect to Ra and how to enable and use different software packages.

More details are available on the PSI intranet (only available with a connection from within PSI or through the remote access servers):
https://intranet.psi.ch/Computing/Offline-computing-facility-for-sls-and-swissfel-data-analysis

#### Ra mailing list

To stay informed about operational issues, scheduled maintenance and downtimes as well as trouble shooting progress, we strongly recommend that all users of the remote access services subscribe to the Ra mailing list, as described also here.  Typically, only a few emails are sent out per year.

### Data transfer

Several options are available to transfer data to and from PSI file systems. General instructions on how to set up and use the file transfer facility are described here:

https://www.psi.ch/photon-science-data-services/slsswissfel-data-transfer

This section describes the more TOMCAT-related aspects of remote data services that are not or only partially covered by the general manuals and guidelines mentioned above.

### Data access and management

#### Access permissions to experimental data on and from Ra

Access to the experimental data for a specific experiment (more precisely, a specific eAccount) needs to be granted to each personal user account individually. Currently, while the Ra cluster is still in the testing phase, this access needs to be configured by one of the beamline staff. Let them know your personal account name (e.g.: "ext-meier_f"), and which eAccount or proposal number you need to access.

In the near future, the granting of access rights will be delegated to the experiment's PI. Access rights will then be configured through the DUO without requiring intervention from the beamline staff.

#### Data locations on Ra

In contrast to using the compute cluster at the beamline during the beamtime, the location of your data is a bit more confusing when using the Ra cluster. The Ra cluster provides direct access to your experimental data acquired during the beamtime ("online"). However, this is a read-only mount of the high-performance online beamline file system. Thus, this prevents you from accidentally deleting data, but also from modifying and adding directly to the original data sets. Instead, a separate file system ("offline") attached to the Ra cluster is used as a work area. In terms of running the reconstruction scripts or other analysis tools, this means that you are reading the raw data from one location, but saving results to another location.

You should be able to see your original experimental data under the following path (online file system):

/sls/X02DA/Data10/$EACCOUNT/  Where $EACCOUNT is the eAccount name (e.g.: e12345) used for data acquisition during the beamtime.

Any output from your analysis should go into the following location (or subfolders thereof):

/das/work/$PNN/$PGROUP


Here, $PGROUP represents the PI-Group associated with the proposal or beamtime (e.g.: p12345), and $PNN are the first three characters of the PI-Group (e.g.: p12). The numerical part of the PI-Group is always the same as for the eAccount.

Some more in-depth information about the data locations and file systems can be found in the Data Section of the central documentation pages.

### Data Transfer

#### Transferring data from Ra to your home institution using ssh/rsync

As an example for TOMCAT data, you would use the following command to transfer the data:

rsync -av $EXT-ACCOUNT@ra-export.psi.ch:/sls/X02DA/Data10/$EACCOUNT/disk1 $DESTINATION_DIRECTORY  Where $EXT-ACCOUNT is your external PSI account user name (e.g.: ext-meier_f), $EACCOUNT is the eAccount name (e.g.: e12345) used for data acquisition during the beamtime, and $DESTINATION_DIRECTORY is the directory on your machine or storage server where the data should be transferred to.

#### Transferring data from a remote location back to Ra using ssh/rsync

Before transferring data back to the Ra cluster, please refer to the general info about data transfer (also mentioned above)

Note that the data needs to be transferred into the disk space associated with your PI-Group on the offline storage since the original storage location on the online storage servers is not accessible with write-rights.

Use the following command to copy data back to the Ra cluster via rsync:

rsync -av PSI_DATA $EXT-ACCOUNT@ra-export.psi.ch:/das/work/$PNN/$PGROUP/.  where $PGROUP represents the PI-Group (e.g.: p12345), and $PNN are the first three characters of the PI-Group (e.g.: p12). So, for example: rsync -av ~/PSI_data/disk1 ext-meier_f@ra-export.psi.ch:/das/work/p12/p12345/.  Sometimes, you may experience file transfer issues (disk quota exceeded, permission denied) when using the above command. In this case, it can help to specifically tell the rsync command what the ownership of the transferred files should be. This is achieved by passing the --chown argument to rsync like so: rsync -rtv --chown=$EXT-ACCOUNT:$PGROUP PSI_DATA$EXT-ACCOUNT@ra-export.psi.ch:/das/work/$PNN/$PGROUP/.

Note also that we had to change the -a option (archive, which tries to preserve the original ownerships) to -rt (recursive + preserve times). For the above example, this would look as follows:

rsync -rtv --chown=ext-meier_f:p12345 ~/PSI_data/disk1 ext-meier_f@ra-export.psi.ch:/das/work/p12/p12345/.

#### Transferring data using the Globus Online application

Follow the instructions about setting up and using Globus Online for data transfer from PSI here. For step-by-step instructions on how to connect to the TOMCAT resources see below.

Open the Globus Online web-application (https://app.globus.org/) in a browser or your local client software on your machine.

Your TOMCAT data should be accessible when using the PSI endpoint and selecting the following path (see the below screenshot):

/~/sls/X02DA/Data10/$EACCOUNT/  where $EACCOUNT is the eAccount name (e.g.: e12345) used for data acquisition during the beamtime.

To access the work area of your PI-Group, you need to use the following path (refer to the section about data locations on Ra above):

/~/das/work/$PNN/$PGROUP/


where $PGROUP represents the PI-Group (e.g.: p12345), and $PNN are the first three characters of the PI-Group (e.g.: p12).

#### Globus Online: Step-by-step instructions

To get started with Globus Online, you need to log on to https://app.globus.org/ with a GlobusID account (free to create one) or if your institution has provides direct acces, using your institutional account. Once logged on, you should be presented with the File Manager view:

Now you need to connect to the PSI endpoint. Type "PSI" into the "Collection" field and select the corresponding endpoint from the list that appears:

To access the files on the PSI endpoint, you need to authorize with your (external) PSI account (e.g.: "ext-user_n") and the PSI password by clicking on the "Continue" button.

Once authenticated, you will be presented with the main directory structure on the data transfer service of the PSI Globus Online endpoint.

You can now navigate to your data directories from here. The experimental data collected during the beamtime is stored in the following folder

/~/sls/X02DA/Data10/$EACCOUNT/  where $EACCOUNT is the eAccount name (e.g.: e12345) used for data acquisition during the beamtime. The processed data when working from the Ra cluster is located in another location:

/~/das/work/$PNN/$PGROUP/


where $PGROUP represents the PI-Group (e.g.: p12345), and $PNN are the first three characters of the PI-Group (e.g.: p12).

Note that you can also bookmark these locations for future reference by clicking on the bookmark icon just to the right of the "Path" input field.

Now that you are connected to the PSI endpoint, you need to repeat the same procedure for the other endpoint of the data transfer (your institutional endpoint or "Globus Connect Personal" endpoint). In the image below, the connection has been established to a Globus Connect Personal endpoint called "asterix02".

You can now transfer files from PSI to the other endpoint simply by dragging and dropping the corresponding files or folders from the left pane into the desired target location in the right pane. Once the transfer is initiated, you can monitor the progress by clicking on the "Activity" item in the menu on the left.

### Running reconstructions on Ra

The procedure to reconstruct "standard" TOMCAT tomography data on Ra is essentially identical to running the reconstructions at the beamline during your experimental campaign. Follow these steps and instructions to get started.

#### Using the Fiji Reconstruction Manager

1. Open the Fiji software: From the start menu (the Egyptian eye symbol in the lower left corner of the screen) select TOMCAT > Fiji (TOMCAT)
2. In Fiji, open the Reco manager plugin: Plugins > TOMCAT > Reco Manager (latest, etc)
3. Click "Select a Dataset" and navigate to the location of the raw data (usually in the read-only eAccount, see above for details on the file locations for your data).
4. Proceed with the reconstruction just like you did when using the same interface at the beamline.
5. The results of the reconstruction will be automatically saved in the pGroup directory associated with the eAccount to which the raw data belongs.

#### Checking the status of the cluster queue

The cluster job scheduling system used on Ra is SLURM. The easiest way to check the status of the cluster queues is by using the Sview program. Launch the program by selecting the sview entry in the Ra start menu in the lower left corner:

The Sview program will list all running jobs in all queues. Through the menu, you can filter this list to only show a particular set of jobs, for example, just those that were launched by yourself. Right clicking on a job opens a context menu from which you are able to pause or cancel a job, etc.

As an alternative to the Sview GUI, you can use the following commands in a terminal to monitor the queue and manipulate individual jobs:

See the currently scheduled, running, and possibly stuck jobs in the queue:

squeue

This gives you the full list of all currently running and scheduled jobs by all users. To restrict the list to your own jobs, specify the user argument (-u). Additionally, you can use the watch command to automatically update the display every few seconds:

watch squeue -u ext-meier_f
1888967_[1-3]   day phrec_MI ext-meie PD       0:00      1 (Dependency)
1888966_2       day fltp_MI0 ext-meie  R       3:48      1 ra-c-031


Delete a job from the queue:

scancel <Job-ID>

where <Job-ID> is the number shown when calling the squeue command. For example:

scancel 1888966_2

Note that depending on how many users are using the cluster, it can happen that your reconstruction jobs are stuck in the queue for some time before they are being executed. Remember that this is a shared resource.

#### Disk space on Ra

The disk space you have available on your work are on Ra (i.e., under /das/work/$PNN/$PGROUP/) is currently limited to 4TB. As soon as you go over that limit, you will not be able to run any more reconstructions. In this case, you should transfer the already calculated results to a storage location at your institution to make room for further calculations. Also, you can run a cleanup script every once in a while which will delete all of the unnecessary temporary data. Navigate to the folder which you want to clean (including all subfolders!) and then run the following command:

/sls/X02DA/bin/cleanup.sh

For example, to clean up any temporary data files in the Data10/disk1 folder in your work area, including all of its sub-folders in the cleanup, do this:

cd /das/work/p12/p12345/Data10/disk1
/sls/X02DA/bin/cleanup.sh

You can see your currently available quota by calling the following command:

/das/support/users/space_usage/pgroup_info p12345

You will then see an output similar to the one below:

Name:        p12345
Unit:        TOMCAT
Kind:        external
Used:        1.2 GB
Members:    ext-meier_t,muster_f
Quota:        4 TB

In case you really require more disk space for your data analysis, you should contact your local support staff and request a quota increase. An extension of the quota can be granted for a limited period of time (usually 3 months) before the quota is automatically set back to its original value. Please specify and justify both the total amount of storage space you need as well as the duration for the requested extension.

#### Common reconstruction problems and some solutions

Fiji does not display images anymore

When working on the Ra cluster, from time to time it happens that Fiji is not displaying any images anymore, but instead presents the user with a gray window where the image should be. In this case, please try to restart Fiji. In very rare cases, it might also be necessary to completely log out of your Ra session (using the "Logout..." command from the main desktop menu in the lower left corner of the screen) and to reconnect to it.

Wrong center of rotation

For most data sets, the reconstruction software should automatically detect the appropriate location of the rotation center for the reconstruction. However, occasionally this automatic determination fails, and phase reconstructions are particularly prone to this problem, but also data sets with an inherently low image contrast.

A wrong value for the rotation center has usually a quite obvious effect on the reconstructed image in that is produces "half-moon" or "C"-shaped features, pointing either to the left or to the right of the image throughout the whole image plane, as shown in the image below.

These artefacts can be fixed by manually adjusting the value used for the rotation center. The direction of the half-circle features indicates whether the center needs to be made higher or lower, as shown in the graphic below. The required change in the center value with respect to the current value is equal to the radius of the half-circle feature detected in the image.

Setting the minimun and maximum intensity values for the reconstruction

The reconstructed images diplayed by the reconstruction manager are 32-bit floating point numbers. These are almost always converted to 8 or 16 bit integer value tif files for the final reconstructed data sets. However, the conversion from floating point to integer numbers needs to be explicitly specified. That is, we need to determine which floating point number corresponds to a value of 0 in the final data sets (lowest intensity value or lower histogram edge), and which float corresponds to a value of 255 or 65535 for 8-bit and 16-bit data, respectively (highest intensity value or upper histogram edge). Running a histogram command on the reconstructed slice (Ctrl-h in Fiji) can help to determine a sensible set of values. Make sure you test these values on a few different slices at different heights through the sample to make sure the values are valid for the whole 3D data set.

### Software tools

#### Available software

The following software packages relevant for tomography data analysis are currently available on the Ra cluster:

• TOMCAT reconstruction pipeline: The pipeline is available via the reconstruction manager GUI from Fiji (see Fiji below). Simply open Fiji and go to Plugins > TOMCAT > RecoManagerRA. This should open the GUI.
• Python (anaconda, python 3.8): See Configure Anaconda Python
• Matlab (multiple versions): Use module load matlab/2016a at the command line to activate the 2016a version of Matlab:
module load matlab/2016a
matlab &

• Compilers: Various common compilers
• Fiji: The full Fiji installation including our custom DMP and hdf5-file plugins is available from the launcher, if working with a NX session and remote graphical user interface. Just open the start menu > TOMCAT > Fiji. From the command line, issue the following command instead:
/sls/X02DA/bin/fiji &

• ParaView:
module use unstable
vglrun paraview

See https://intranet.psi.ch/Computing/DaasScientificApplications on the PSI intranet for more details on how to use and run ParaView.
• Mathematica:
module use unstable
mathematica &


See https://intranet.psi.ch/Computing/DaasScientificApplications on the PSI intranet for more details.

To see a full list of all available modules, type the following command in a terminal shell:

module use unstable
module avail


More general information about installed software can be found here: https://www.psi.ch/photon-science-data-services/offline-computing-facility-for-sls-and-swissfel-data-analysis#Software

If you require any other software packages for your data analysis that are currently not available on the cluster, please contact us. We will check with our IT support to see if the missing software can be installed/provided (subject to availability, compatibility, and licensing restrictions).

#### Configuring Python

All TOMCAT-specific scripts and tools written in python were built and tested in conda environments for Python 3.8. In order to use our tools, one first needs to activate the correct conda environment. Type the following command in a shell to activate the TOMCAT python environment:

source /sls/X02DA/applications/py_envs/tomcat_daq/activate_py38.sh


Note that you need to run all commands that ought to make use of this environment from that same terminal where you activated the environment. When opening an new shell, you always need to run the above command again to activate the environment for that session.