DHPC: The TU Delft supercomputer

DHPC is the supercomputer at TU Delft. DHPC is designed to meet researchers increasing need for extensive computing power to solve complex problems in physics, mechanics and dynamics. The system is characterized by flexibility, speed and user-friendliness. In its final phase it will offer 20,000 CPU cores in over 400 compute nodes. It incorporates a high-speed parallel storage subsystem, based on a BeeGFS. All the compute nodes and the storage system are interconnected with HDR100 InfiniBand technology for high-throughput low-latency inter-node communication.

Description of the DHPC system

The solution is built on Fujitsu hardware and makes use of: the latest generation of processors with over 20,000 cores; the highest available memory throughput (2933Mhz); latest generation IB interconnect (HDR/HDR100) used for all nodes; and a high throughput IO subsystem. 

Compute nodes

The compute nodes are built using the latest Intel Cascade Lake refresh processors offering high performance and power efficiency. The Rpeak for the system in phase 1 is 1.05 PFlops and for Phase 2 is 1.88 PFlops.

The cluster consists of three different types of compute nodes:

  • Standard compute nodes
  • Fat compute nodes (large memory) in 2 memory configurations
  • Compute nodes with CPU's equipped with NVIDIA Tesla cards
Node types

Node Category

Number

Cores

CPU / GPU

Memory

SSD

Standard (phase 1)

218

48

2x Intel XEON E5-6248R 24C 3.0GHz

192 GB

480 GB

Standard (phase 2)

398

48

2x Intel XEON E5-6248R 24C 3.0GHz

192 GB

480 GB

Fat type-a

6

48

2x Intel XEON E5-6248R 24C 3.0GHz

768 GB

480 GB

Fat type-b

4

48

2x Intel XEON E5-6248R 24C 3.0GHz

1536 GB

480 GB

GPU

10

48

2x AMD EPYC 7402 24C 2.80 GHz
4x NVIDIA Tesla V100S 32GB

256 GB

 

 

Summary and performance

 

 

Phase 1

Phase 2

CPU total

Compute nodes
CPU’s
Compute cores
Rpeak (theoretical max. performance, in PFlops)

238
476
11424
1.05

418
836
20064
1.88

GPU total

GPU nodes
GPU’s
Tensor cores
CUDA cores
Double Precision (theoretical max. performance, in TFlops)
Single Precision (theoretical max. performance, in TFlops)
Deep Learning (theoretical max. performance, in PFlops)

10
40
25600
204800
328
656
5.2

10
40
25600
204800
328
656
5.2

 

Front-end nodes

The Cluster uses a number of front-end nodes as the entry point to the cluster for end-users and administrators.

  • Login nodes
  • Interactive/visualization nodes equipped with NVIDIA Quadro RTX cards
Front-end nodes

Node Category

Number

Cores

CPU / GPU

Memory

Login

4

32

2x Intel XEON Gold E5-6226R 16C 2.9GHz

384 GB

Interactive/visualization

2

32

2x Intel XEON Gold E5-6226R 16C 2.9GHz
1x NVIDIA Quadro RTX4000

192 GB

 

Highlights / Details:

  • The login nodes are meant to be the main access point for all end-users of the system. It is expected they will have a high level of competition of user sessions and therefore they have been configured with 384GB or memory. In addition, these nodes are configured as HA pairs to ensure the effects of a single node failure does not stop user access to the cluster.
  • The Interactive/visualization nodes can be used for running specific interactive tasks that need to utilize a high-end graphics card for visualization or for workloads that may not be suitable for running on the cluster. Quadro RTX 4000:  with 2304 CUDA cores, 288 Tensor Cores, 36 RT cores and 8 GB GDDR6 memory.
  • Two File Transfer nodes are specifically included in order to provide an optimal data flow between the DHPC and the central research storage of the TU Delft. These nodes can be used as a pre-staging or post-staging job that runs just before or after a computational job.

Interconnect

HPC applications make frequent use of communication between nodes when calculating their computational results. To maintain application efficiency even when scaling over a large number of nodes, the interconnect must minimise overhead and enable high-speed message delivery. DHPC is equipped with a high performing InfiniBand network based on Mellanox InfiniBand products to build an efficient low-overhead transport fabric. This interconnect set-up not only enables efficient delivery of MPI based messages but also provides applications with high-speed access to the temporary storage area which is also accessible over the InfiniBand fabric.

Highlights/details:

  • Mellanox InfiniBand HDR100/HDR interconnect configured in a Full Bisectional Bandwidth (FBB) non-blocking network fabric.
  • Fat tree topology
  • 100Gbps IB HDR100 HCA’s per server

Storage

To enable efficient I/O throughput for computational jobs, the configuration includes a high-speed file system with 696 TB usable storage space and a throughput of at least 20 GB/s. This storage subsystem consists of:

  • 6x IO servers – 2 MetaData servers and 4 Storage Servers
  • 1x NetApp All flash storage/controller shelf – MetaData storage
  • 4x NetApp storage/controller shelves – File data storage

Highlights/details:

  • NetApp All flash subsystem for managing the Meta Data requirements
  • A set of NetApp High capacity, high throughput disk subsystems providing redundancy at all levels to avoid Single Point Of Failure conditions
  • High speed connectivity
  • Multiple HDR100 IB links per Storage server to the IB network
  • Multiple high-speed FibreChannel 32 Gbps connections between the storage servers and storage devices
  • File system built using the BeeGFS parallel file system
  • Dynamic Job Parallel File System “BeeOND” is a facility, which lets a user build a parallel file system from the local disks of the nodes over which their job is running. Each node of the cluster incorporates a 480GB SSD device in addition to the OS boot device. This SSD can be used to build a local parallel file system.

HPC Management solution

Bright Cluster Manger (BCM) is used for deploying and managing the cluster and HPC software environment. It includes a complete set of HPC libraries, tools, management and reporting facilities.

Operating System

Red Hat Enterprise Linux 8

Job Management

DHPC is equipped with Slurm job resource manager, which will provide access to resources, based on the fair share principle.

End user Portal

Traditionally a HPC provides its users with SSH and FTP to access cluster resources. However, our goal is to encourage the use of the HPC facilities by the use of a modern and easy to use portal. Therefore, DHPC is equipped with the Open OnDemand portal.