VSC-4 – user test phase

VSC-4 is the most powerful supercomputer ever installed in Austria, reaching a performance (Rmax) of 2.7 PFlop/s, with the theoretical peak performance Rpeak being 3.7 PFlop/s. The new system consists of 790 water cooled nodes (Lenovo SD650), each with two Intel Skylake Platinum 8174 processors with 24 cores, interconnected with 100 Gbit/s OmniPath.

VSC-4 was installed in summer 2019 at the Arsenal TU building (Objekt 214) in Vienna by Opens external link in new windowEDV-Design.

The new VSC-4 system consists of 790 directly water cooled nodes (Lenovo SD650; the picture in the lower-right corner shows one tray with two nodes on it). Each node has 2 Intel Skylake Platinum 8174 processors with 24 cores each, that is 48 physical cores per node and a total of 37,920 cores for the whole system.

The installed Intel Skylake Platinum 8174 processors are Opens external link in new windowa variant of the Intel® Xeon® Platinum 8168 Processor with the installed ones having a higher clock rate (with a nominal base frequency of 3.1 GHz and a maximum turbo frequency of 3.9 GHz).

The 700 standard nodes have a main memory of 96 GByte, there are 78 fat nodes with 384 GByte of main memory and 12 very fat nodes with 768 GByte, offering a total of 106,368 GByte of main memory.

Each node is also equipped with an SSD device of 480 GByte, available as temporary storage during the runtime of a job.

A node reaches about 250 points in the SPECrate2017 Floating Point benchmark, which is about 5 times the performance of a VSC-3 node (16 cores). Looking at the Linpack benchmark, a single VSC-4 node reaches about 3 TFlop/s, about 10 times the performance of a VSC-3 node at slightly less than 300 GFlop/s. A large part of this floating-point performance comes from the 2 AVX-512 units per core, each allowing for 8 fused-multiply-add instructions per cycle, hence permitting –in an optimized situation– 32 floating-point instructions per cycle per core albeit at a slightly reduced processor frequency.

The compute nodes are directly water cooled allowing to use primary cooling water with a temperature in excess of 43℃, permitting year-round free cooling. Up to 90% of the energy will be removed by this high-temperature loop with the remainder being removed by air cooling. This permits a very reasonable energy foot-print.

The system is complemented with 10 login nodes and parallel file systems.

Compute nodes, login nodes and file system nodes are interconnected with a high-speed 100 Gbit/s OmniPath network. The OmniPath network has a two-level fat-tree topology with a blocking factor of 2:1. With its 48 ports, an edge switch connects to 32 compute nodes, giving non-blocking access to 1536 cores. The remaining 16 ports connect via optical fiber cables to the 16 core switches building the second level of the layered fat-tree. In addition there is a 10 Gbit/s Ethernet management network available.

 

Opens external link in new windowVSC-4 in the TOP500 list (Rank 82 in June 2019)

 

Press coverage on VSC-4