Usage of the MCC Computer Cluster

Accounts

Cluster Login:

To login to the cluster, open a shell/console and type

ssh username@mcc


or, if you have an account on MCC2

ssh username@mcc200




You will then be prompted for your password.


Depending on the account that you have, certain differences apply regarding the place (i.e. folder) from where you can run your jobs and the nodes that you can access and use. Those differences are discussed below.
Should you have any questions or problems regarding your account, contact E. Rantsiou.

Common Account l_mc01 for LDM/LNS users

This is a common account with many users and only accessible for LDM and LNS group members.
Users of this account submit their jobs and store their files, executables, etc., within the /home/l_mc01/mpi directory. Each user that has access to this account has either his/her own subdictory into /home/l_mc01/mpi and/or access to existing 'project' directories in there.
Jobs submitted under that account, can make use of the LDM/LNS's part of the cluster, i.e., nodes mcc01 - mcc06.

ATTENTION : Interactive usage of the cluster is allowed only on nodes mcc04 and mcc05. The other four nodes (mcc01-mcc04) can only be used through a submission batch system, i.e. a job scheduler where users submit their jobs via a script, and then the scheduler takes over placing the jobs in a queue and allowing them to run as resources become available.

What does that practically mean if you are a user of the l_mc01 account?
- if you wish to run your jobs interactively, you have at your disposal two nodes (mcc05 and mcc06), each one of which has 72 cores. remember that you have to login on MCC and launch your jobs from there into mcc05 and mcc06
- if you would like to use the batch system, then you will need to ask for a personal account to be opened for you (contact E. Rantsiou). You will then have access to nodes mcc01-mcc04, each one of which has 24 cores
NOTE : For further information on the usage of the batch system, read the sections Submission Script Basics, Partitions -- for LNS/LDM members, Submitting a Job, and Monitoring Your Jobs and the Cluster.

Individual Account l_yourlastname for LDM/GFA users

Those are individual accounts (only one user per account) for the LDM and GFA group members.
Users of this accounts submit their jobs and store their files, executables, etc within the /home/l_yourlastname directory. The user can create his/her own folders within that directory and submit jobs from anywhere within /home/l_yourlastname.
Jobs submitted under those accounts, can make use of nodes mcc20-mcc37 and mcc201-mcc212, the new LDM/GFA part of the cluster. Read section Running Parallel Jobs for details.

Software

A variety of software and compilers are available on MCC. Some of them are available as modules, which means that the user has to 'load' them, before being able to use them.
With the command "module avail", a list of the available modules apears.
Then by doing "module load software" (where 'software' is one of the items on the list that module avail gave), the desired software can be loaded.
A "module unload software" command will unload the specific software.
Finally, a "module list" command will show the list of software that is currently loaded.

Information regarding the available software and their use on the MCC and other clusters

McStas

How to use McStas @ PSI:
Four different versions are installed on the MCC cluster: v1.12c, v2.0, v2.1, and v2.2a. The default version is 1.12c. To use one of the other versions, you will have to load it as a module. To use for example version 2.0: module load mcstas/mcstas2.0 McStas can be started using command lines (mcstas *.instr) or preferably using the graphical user interface by calling mcgui in a terminal. NOTE, that when using version 2.0, one should use mcgui-2.0 and mcstas-2.0 instead.

The McStas official website has online Documentation and tutorial documents that can help you with using, running and understanding McStas

MCNPX

The following information are only relevant to users with access to the MCNP source code. Users that simply have a MCNP executable, can skip forward to Running Parallel Jobs.
Since MCNP comes with individual (personal) licenses, each user who obtains an MCNP license, has to install his/her version locally on their account.
The software comes with installation instructions, and if you are granted permission to the source code, then you most likely already know how to compile it and install it. However, here are some basic instructions for those users who wish to compile their own version of MCNP on MCC:
Installation instructions for MCNPX

Running Parallel Jobs

As of July 2012, the LDM/GFA part of MCC is equipped with the batch system SLURM (v14.11.7).
What that practically means, is that all runs in nodes mcc20-mcc37 and mcc201-212 must be submitted using a submission script.

Submission Script Basics

The submission script to use in order to start a run, should look something like this: example submission script
(if you are a McStas or MCNPX user, make sure to read section "Submission scripts for McStas and MCNPX").
You can copy this file and use it as your submission script (you can rename it to anything you like), after doing the necessary alterations.
For that, you should take a look at the instructions below and also at the explanation provided within the file.
From all the lines contained within the submission script, five of them must be altered so that they correspond to your current run:

#SBATCH -J job_name
job_name should be replaced by the name you want your run to have when it appears in the queue list

#SBATCH -N 6
6 should be replaced with the number of nodes * you wish your run to occupy.

#SBATCH --time=00:30:00
The duration of your run in hh:mm:ss. Always make sure to give an accurate amount of time for your run (and then some) in order to make sure that your job won't be terminated by the batch system before it completes.

#SBATCH --partition=short
short should be replace with the appropriate partition name. Read below for the description of the available partitions.

mpirun -np #_of_cores your_executable
This is the actual "run" command, where #_of_cores should be replaced with the number of cores you wish your run to use (take a look below to check the limitations in number or cores you can use per run, depending on the partition you are running in).

Partitions -- for LDM/GFA users

There are currently four available partitions (typing "*sinfo*" in your terminal will give you the following info):

short Partition

The short partition has a maximum run duration of 1 hour (no upper limit on amount of cores to use).

medium Partition

The medium partition has a maximum run duration of 2 days. Maximum number of available cores for a medium run is 240 cores ( = 10 nodes).

long Partition

The long partition has a maximum run duration of 7 days. Maximum number of available cores for a long run is 168 cores ( = 7 nodes).

test Partition

The test partition has a maximum run duration of 2 hours. Maximum number of available cores for a test run is 24 cores ( = 1 node).
The test partition is reserved only for running short tests (e.g., performance related etc.).
Test jobs will always run on node mcc37.

Choosing Partition for Your Run

Based on the above characteristics of the partitions, you should choose which partition is appropriate for your job, and define it into your submission script.

As an example: Your run needs 3 hours to complete when running on 48 cores. The relevant lines in the submission script should be altered to:

#SBATCH -N 2
#SBATCH --time=03:00:00
#SBATCH --partition=medium
mpirun -np 48 your_executable

Partitions -- for LNS/LDM members

There are currently four available partitions (typing "*sinfo*" in your terminal will give you the following info):

ll_short Partition

The ll_short partition has a maximum run duration of 2 hours (no upper limit on amount of cores to use).

ll_medium Partition

The ll_medium partition has a maximum run duration of 12 hours. Maximum number of available cores for a medium run is 48 ( = 2 nodes).

ll_long Partition

The ll_long partition has a maximum run duration of 1 day. Maximum number of available cores for a long run is 48 cores ( = 2 nodes).

ll_verylong Partition

The ll_verylong partition has a maximum run duration of 2 days. Maximum number of available cores for a very long run is 24 cores ( = 1 node).

Choosing Partition for Your Run

Based on the above characteristics of the partitions, you should choose which partition is appropriate for your job, and define it into your submission script.

As an example: Your run needs 3 hours to complete when running on 48 cores. The relevant lines in the submission script should be altered to:

#SBATCH -N 2
#SBATCH --time=03:00:00
#SBATCH --partition=ll_medium
mpirun -np 48 your_executable

Submitting a Job

Now that your submission script is in order (let's say you've named it "submission_script"), you can submit the job, by doing:

sbatch submission_script

Submission Scripts for McStas and MCNPX

The submission scripts provided here are identical to the one given above, with one difference: the actual "run command". Both MCNPX and McStas require a few more parameters added to the "mpirun" command, other that the -np and the name of the executable to run.

MCNPX

When running mcnpx, one needs to specify an input and an output file.

Something like this:

mpirun -np 36 mcnpx i=input n=output.

Here's an example submission script for MCNPX.

McStas

In order to run your something.instr file in McStas, you need to create the executable first, which is usually named something.out.

a) As a first step, you need to translate your instrument file into C,

mcstas -I /full/path/of/directory/with/instrument/file -t -o something.c something.instr

b) and then create the executable of the C code:

mpicc -O2 -w -ax -o something.out something.c -lm -DUSE_MPI

c) You can now use the mpirun command to start the run:

mpirun -np 24 something.out --ncount=1000000 lambda_min=3 lambda_max=10

where --ncount is the number of neutrons you want to use, and in place of lambda_min=3 lambda_max=10 you should write down all the parameters that your instrument file has.

Note:
It is up to you to decide whether steps a) and b) above will be included into your submission script or not. For example you have already the .out file from a previous compilation and therefore there is no need to create it again. If not, you can simply just execute those two steps from within your log-in shell before submitting your submission script, which should only include step c) in this case.

The example script provided contains all three steps for completion.

Here's an example submission script for McStas.

IMPORTANT: If you are submitting McStas jobs as an l_mc01 user, make sure to use the full path for the mpirun command (/afs/psi.ch/project/sinq/sl6-64/bin/mpirun) in your submission script (as it is done in the example file given above). This will save you from serious trouble.

Monitoring Your Jobs and the Cluster

In order to monitor the status and progress of your jobs, here's a list of commands that you can use in your log-in shell:

sinfo Information on the available partitions

squeue Gives a list of all jobs currently running or waiting to run. By using squeue -u your_username you can get a list that will include only your jobs.

scontrol scontrol will give you a prompt that will look like:
scontrol:
  • Typing show jobs will give a detailed list of all the jobs.
  • Typing instead show job ID, where ID is the ID number of the job, will give you detailed information on a specific job.
  • Typing show partitions will give a detailed description of the available partitions (more detailed than what squeue gives).
  • Type exit or quit to exit scontrol.

scancel ID will cancel the specified job ID, whether it's already running or pending.

You can also have a graphical view of the cluster status (present and past) by launching a webbrowser after having logged in mcc10, and visiting: /ganglia

FAQs

--