Abstract of OPT-DG2 Project

The high order 3D Discontinuous Galerkin code Fluxo is an open-source project (github.com/project-fluxo) and has been developed over the last 5 years at IPP. Fluxo solves the 3D MHD equations, including nonlinear and resistive terms, has an explicit time integration and uses unstructured hexahedral meshes. The code aims to improve the scalability of 3D nonlinear MHD simulations of fusion plasmas.

The code is pure-MPI parallelized and production runs of O(10,000) MPI ranks are possible. Fluxo exhibits excellent weak and strong scaling on current NUMA architectures. This is related to the Discontinuous Galerkin scheme with explicit time integration, having dense local operations, only direct neighbor communication and a low memory consumption. A point-to-point non-blocking MPI communication is employed, which overlaps with local computations.

In the first assessment project (2017) carried out by HLST, the MPI parallelization of Fluxo was analyzed in detail on the Marconi-Skylake architecture, and the communication/computation overlap was confirmed. In addition, a two-level communication infrastructure separating intra- and inter-node communication was implemented. This already lead to a reduced communication cost, and potential improvements for increasing the overlap were identified.

In the previous HLST project (OPT-DG), the potential for vectorization was assessed in detail with performance tools (Intel VTune, Advisor, Likwid) in order to make a roofline analysis. Main hotspots were identified, and a first assessment of simplified code parts regarding vectorization with AVX-512 instruction showed potential for further code optimizations. However, due to the complexity of the equation system and the data-structure of the 3D high order discretization, substantial changes in the code would be implied, beyond the scope of the project. We therefore apply for its extension, to be able to profit from the findings and insights gained and refactor the computationally heavy parts so that the vectorization potential can be fully exploited.