Abstract of MPI3-DG Project

The high order 3D Discontinuous Galerkin code Flexi has been developed during the coordinators' PhD at University of Stuttgart. It was further developed at IPP over the last 2 years. It solves the 3D full MHD equations, including nonlinear and resistive terms, has an explicit time integration and uses unstructured hexahedral meshes. The code aims to improve the scalability of 3D nonlinear MHD simulations of fusion plasmas.

The code is fully MPI parallelized and production runs of O(10,000) MPI ranks are possible. Flexi exhibits an ideal weak and strong scaling on current NUMA architectures. This is related to the Discontinuous Galerkin scheme with explicit time integration, having dense local operations, only direct neighbor communication and a low memory consumption.

The current MPI parallelization has a single global MPI communicator with each core representing one MPI rank, and hence does not distinguish between cores within a given node and the remaining ones. This produces a communication overhead inside each node because of intra-node messaging and also across nodes because of small message sizes.

Future supercomputing hardware will be based on many-core architectures. The new Intel Xeon Phi (Knights Landing) demonstrates this trend, being a CPU with 64-72 cores. On these architectures, a pure MPI implementation is questionable. One approach to avoid the overhead is a hybrid OpenMP/MPI implementation. However, an attractive alternative exploits shared memory capabilities of the new MPI-3 standard directly. Here, a two level communicator can be introduced where intra-node communication is replaced by shared memory access. The new approach would only be dependent on the MPI-3 standard. The aim of this project is the assessment of this approach, regarding its feasibility, changes in the code design and the scaling behavior, focusing on the new Intel Xeon Phi Knights Landing of the Marconi cluster.