Skip to content. | Skip to navigation

EUROfusion

Abstract of OPT-DG Project

The high order 3D Discontinuous Galerkin code Fluxo is an open-source project (github.com/project-fluxo) and has been developed over the last 4 years at IPP. Fluxo solves the 3D MHD equations, including nonlinear and resistive terms, has an explicit time integration and uses unstructured hexahedral meshes. The code aims to improve the scalability of 3D nonlinear MHD simulations of fusion plasmas.

The code is pure-MPI parallelized and production runs of O(10,000) MPI ranks are possible. Fluxo exhibits an ideal weak and strong scaling on current NUMA architectures. This is related to the Discontinuous Galerkin scheme with explicit time integration, having dense local operations, only direct neighbor communication and a low memory consumption. A point-to-point non-blocking MPI communication is employed, which overlaps with local computations.

In the first assessment project carried out by HLST, the MPI parallelization of Fluxo was analyzed in detail on the Marconi-Skylake architecture, and the communication/computation overlap was confirmed. In addition, a two-level communication infrastructure separating intra- and inter-node communication was implemented. This already lead to a reduced communication cost, and potential improvements for increasing the overlap were identified.

Future supercomputing hardware will be based on many-core architectures, with large vector units and low clock frequencies, such as the Intel Knights Landing (KNL), available on Marconi. Large scale runs of Fluxo are already conducted on KNL, however with decreased performance compared to Skylake. Therefore, this project aims to improve the overall performance on such hardware. First, the parallel performance assessment of the previous project will be extended to KNL. Second, single core performance will be analyzed, regarding vectorization with the AVX-512 instruction set, to identify hotspots and propose changes that need to be tested. Furthermore, it is expected that the performance on Skylake also benefits from such optimizations.