This page is read only. You can view the source, but not change it. Ask your administrator if you think this is wrong. ====== Quick start guide for VSC-5 ====== **Status: 2022/04** This page is under construction. ===== Connecting ===== <code> ssh <user>@vsc5.vsc.ac.at </code> or alternatively use a specific login node: <code> ssh <user>@l5[0-6].vsc.ac.at </code> ===== Data storage ===== On VSC-5 the same directories exist as on VSC-4 as we make use of the same IBM Spectrum Scale GPFS storage. Consequently, you will have access to your data in ''$HOME'' and ''$DATA'' as on VSC-4. There is, however, no BINF filesystem available. ===== Loading Modules & Spack Environments ===== Different CPUs come with different compilers, so we use the new spack feature ''environment'' to make sure to choose the right package. On login the default ''spack environment'' (zen3) is loaded automatically, so only modules that run on AMD processors are visible with ''spack find''. On VSC5 no default modules are loaded. Please do that by yourself using ''spack load <module>'' or ''module load <module>''. Find the official SPACK documentation at https://spack.readthedocs.io/ ==== List Spack Environments ==== Type ''spack env list'' to see which environments are available and which one is active. <code> $ spack env list ==> 2 environments cascadelake zen3 </code> The current ''spack environment'' is also shown in your prompt: <code> (zen3) [myname@l55 ~]# </code> Mind that if your prompt is changed later, like when loading a ''python environment'' using ''conda'', the correct ''spack environment'' might not be shown correctly in your prompt. When a spack environment is activated, the command ''spack find -l'' lists those packages available for the active environment. The command ''module avail'' will also show only those modules that are compatible with the active spack environment. ==== Change Spack Environment ==== If you want to look for a certain package that belongs to another architecture, first change the spack environment: <code> $ spacktivate <myenv> $ spacktivate cascadelake </code> Only then ''spack find'' will show the modules for the active environment (e.g. ''cascadelake''). ==== Save Spack Environment ==== The following creates a load script for your current spack environment with all loaded modules: <code> $ spack env loads -r </code> This creates a file called ''loads'' in the environment directory. Sourcing that file in bash will make the environment available to the user. The ''source loads'' command can be included in ''.bashrc'' files. The loads file may also be copied out of the environment, renamed, etc. ==== Load a Module ==== Please always use spack, see [[doku:spack|SPACK - a package manager for HPC systems]]. ===== Compile Code ===== A program needs to be compiled on the hardware it will later run on. If you have programs compiled for VSC4, they will run on the ''cascadelake_0384'' partition, but not on the default AMD partition that uses AMD processors! ==== AMD: Zen3 ==== Most nodes of VSC5 are based on AMD processors, including the login nodes. The spack environment ''zen3'' is loaded automatically, so you can use ''spack load <mypackage>'' to load what you need, compile your program, and submit a job via slurm. The nodes have 2x AMD Epyc CPUs (Milan architecture) each equipped with 64 cores. In total there are 128 physical cores (core-id 0-127) and 256 virtual cores available. The A100 GPU nodes have 512GB RAM and the NVIDIA A100 cards have 40GB RAM each. At the moment 40 GPU nodes are installed. <code> $ nvidia-smi Tue Apr 26 15:42:00 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.39.01 Driver Version: 510.39.01 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100-PCI... Off | 00000000:01:00.0 Off | Off | | N/A 40C P0 35W / 250W | 0MiB / 40960MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA A100-PCI... Off | 00000000:81:00.0 Off | Off | | N/A 37C P0 37W / 250W | 0MiB / 40960MiB | 40% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ </code> ==== Intel: Cascadelake ==== If you have programs already compiled for VSC4, they will also run on the ''cascadelake_0384'' partition of VSC5. Otherwise you need to login on a ''cascadelake_0384'' node (or any VSC4 node) and compile your program there. Then submit a job on the slurm partition ''cascadelake_0384''. There are 48 nodes which have 2x Intel Cascadelake CPUs, 48 cores each. In total 96 physical cores (core-id 0-95) and 192 virtual cores are available. Each node has 384GB RAM. ===== SLURM ===== The following partitions are currently available: <code> $ sinfo -o %P PARTITION gpu_a100_dual* -> Currently the default partition. AMD CPU nodes with 2x AMD Epyc (Milan) and 2x NIVIDA A100 and 512GB RAM cascadelake_0384 -> Intel CPU nodes with 2x Intel Cascadelake and 384GB RAM zen3_0512 -> AMD CPU nodes with 2x AMD Epyc (Milan) and 512GB RAM zen3_1024 -> AMD CPU nodes with 2x AMD Epyc (Milan) and 1TB RAM zen3_2048 -> AMD CPU nodes with 2x AMD Epyc (Milan) and 2TB RAM </code> ==== QoS ==== During the friendly user test phase the QoS ''goodluck'' can be used for both partitions. ==== Submit a Job ==== Submit a job in ''slurm'' using a job script like this (minimal) example: <file sh defjob.sh> #!/bin/bash #SBATCH -J <meaningful name for job> #SBATCH -N 1 #SBATCH --gres=gpu:2 ./my_program </file> This will submit a job in the default partition (gpu_a100_dual) using the default QoS (gpu_a100_dual). To submit a job to the cascadelake nodes: <file sh cascjob.sh> #!/bin/sh #SBATCH -J <meaningful name for job> #SBATCH -N 1 #SBATCH --partition=cascadelake_0384 #SBATCH --qos goodluck ./my_program </file> Job Scripts for the AMD CPU nodes: <file sh zen3_0512.sh> #!/bin/sh #SBATCH -J <meaningful name for job> #SBATCH -N 1 #SBATCH --partition=zen3_0512 #SBATCH --qos goodluck ./my_program </file> <file sh zen3_1024.sh> #!/bin/sh #SBATCH -J <meaningful name for job> #SBATCH -N 1 #SBATCH --partition=zen3_1024 #SBATCH --qos goodluck ./my_program </file> <file sh zen3_2048.sh> #!/bin/sh #SBATCH -J <meaningful name for job> #SBATCH -N 1 #SBATCH --partition=zen3_2048 #SBATCH --qos goodluck ./my_program </file> Example job script to use both GPUs on a GPU nodes: <file sh twogpujob.sh> #!/bin/sh #SBATCH -J <meaningful name for job> #SBATCH -N 1 #SBATCH --partition=gpu_a100_dual #SBATCH --qos goodluck #SBATCH --gres=gpu:2 ./my_program </file> Example script to use only one GPU on a GPU node: <file sh onegpujob.sh> #!/bin/sh #SBATCH -J <meaningful name for job> #SBATCH --partition=gpu_a100_dual #SBATCH --qos goodluck #SBATCH --gres=gpu:1 ./my_program </file> Your job will then be constrained to one GPU and will not interfere with a second job on the node. It will not be possible to access the other GPU card not assigned to your job. More at [[doku:slurm|Submitting batch jobs (SLURM)]], but bear in mind the different partitions for VSC4! Official Slurm documentation: https://slurm.schedmd.com ===== Intel MPI ===== When **using Intel-MPI on the AMD nodes and mpirun** please set the following environment variable in your job script to allow for correct process pinning: <code> export I_MPI_PIN_RESPECT_CPUSET=0 </code> doku/vsc5quickstart.txt Last modified: 2022/06/24 09:54by jz