# VSC Wiki

## Systems

### VSC 3

The original VSC3 has been decomissioned. The extension VSC3+, the Bioinformatics Nodes and the GPU Nodes are still operational

Decomissioned

Decomissioned

### Parallel computing

##### VSC-3:
doku:performance_tests

## Performance Tests

In this section the results for different performance tests are presented.

### Matrix diagonalisation

#### Libraries

Following MPI Versions and libraries were used:

qlogic:

• vsc1: qlogicmpi_intel-0.1.0 (compiled with intel ifort)
• vsc2: qlogicmpi_intel-3.0.1 (compiled with intel ifort)

mvapich2:

• vsc1: mvapich2_intel_qlc-1.6 (compiled with intel ifort for qlogic)
• vsc2: mvapich2_1.8a2_intel_limic (compiled with ifort and limic module)

impi:

• vsc2: impi_4.0.1.007 (Intel MPI)

sca:

• scalapack 2.0.0 compiled using INTEL ifort and GotoBLAS2,lapack-3.3.1

mkl:

• vsc1: INTEL mkl libraries Version 11.1/046
• vsc2: INTEL mkl libraries Versoin 2011_sp1.9.293

elpa:

Elpa was compiled using sca from above + mkl libraries; when using only mkl libraries BLACS errors occured.

#### Timings Blocksize 64

In a small test programm a Matrix of size N x N with N = 512, 1024, 2048, 4096 was randomly setup and diagonalized using PZHEEVX from SCALAPACK and solve_evp_complex_2stage from ELPA. The timings are only given for the diagonalization part.

For each number of cores all possible processor row/column combinations of row/cols = 1,2,4,8,16,32,64 were calculated. In the plotted data only the lowest times are presented.

Absolute timings of the different subroutines:

Scaling of the runtimes relative to the calculation with 16 cores:

#### Timings Blocksize optimized

For qlogic MPI we also tested the influence of different blocksizes on VSC-1 and VSC-2. The runs were performed as above, but the calculations were done for blocksizes = 2,4,8,16,32,64. The data in the plots and the tables represents the lowest obtained timings for a certain matrix size and number of used cores.

#Data obtained from VSC-1 with qlogic MPI:

cores       time         blocksize
SCA      ELPA    SCA   ELPA
------------------------------------
Matrix Size 512:
16    0.081    0.072    16     8
32    0.087    0.059    32     4
64    0.085    0.049    32     2
128    0.093    0.043     4     4
256    0.114    0.040    32     8
------------------------------------
Matrix Size 1024:
16    0.320    0.402    16     2
32    0.274    0.263    32     2
64    0.245    0.187    32     4
128    0.249    0.153    32     8
256    0.273    0.120    32     2
------------------------------------
Matrix Size 2048:
16    1.699    2.565    16     2
32    1.148    1.498    32     4
64    0.856    0.907    32     4
128    0.749    0.613    32     4
256    0.666    0.442    32     8
------------------------------------
Matrix Size 4096:
16   11.921   17.662    32     8
32    6.559    9.710    32    16
64    4.101    5.549    16     2
128    2.837    3.264    32    16
256    2.136    2.066    16     4 
#Data obtained from VSC-2 with qlogic MPI:

cores       time         blocksize
SCA      ELPA    SCA   ELPA
------------------------------------
Matrix Size 512:
16    0.101    0.097    16     4
32    0.096    0.077    16     2
64    0.090    0.066     8     4
128    0.109    0.058    16     4
256    0.126    0.054     4     4
------------------------------------
Matrix Size 1024:
16    0.423    0.525    16     4
32    0.312    0.341    16     4
64    0.249    0.254    16     4
128    0.266    0.189     8     4
256    0.251    0.148     8     8
------------------------------------
Matrix Size 2048:
16    2.448    3.264    32     4
32    1.460    1.974    16     4
64    0.987    1.173    16    16
128    0.848    0.777    16     8
256    0.671    0.545     4     4
------------------------------------
Matrix Size 4096:
16   19.075   22.678    32     2
32   10.114   12.827    32     8
64    5.705    7.059    32     8
128    3.463    4.288    16    16
256    2.461    2.624    16     2