Flexi is designed to make use of the modern parallel CPU-based architectures of HPC systems. It has demonstrated perfect weak and strong scaling on up to 260,000 cores, and is routinely used in production runs on 100,000 cores.
The simulation framework is MPI-parallelized based on a space-filling curve approach connecting the elements. Due to the small communication footprint of DG methods and latency hiding techniques through non-blocking, overlapping message patterns, optimal strong scaling is achieved down to one element per core for moderate polynomial orders.
The framework uses the HDF5 library for fast parallel I/O. It has a very low memory footprint and is generally CPU-bound, which makes it highly suitable for modern CPU-based HPC platforms.
Among others, the framework has run successfully on the following HPC systems:
- Cray XC40 “Hazel Hen” (HLRS, 7.4 PFlops)
- Cray XC40 “Hornet”(HLRS, 3.8 TFlops)
- IBM BlueGene/Q “JuQueen” (JSC Juelich, 5.9 PFlops)
- Cray XE6 “Hermit” (HLRS, 1 PFlops)
- IBM BlueGene/P “Jugene” (JSC Juelich, 1 PFlops)
Domain Decomposition optimal number of neighbors by space-filling curve
Strong Scaling on Hornet perfect granular strong scaling for N>5
Communication Strategy latency hiding through non-blocking send/receive