Kokkos supporting for complex data discretization and unstructured meshes
Kokkos supporting for complex data discretization and unstructured meshes
Level of qualifications required: PhD or equivalent
Fonction: Temporary scientific engineer
Level of experience: From 3 to 5 years
About the research centre or Inria department
The Inria Saclay-Île-de-France Research Centre was established in 2008. It has developed as part of the Saclay site in partnership with Paris-Saclay University and with the Institut Polytechnique de Paris.
The centre has 40 project teams , 27 of which operate jointly with Paris-Saclay University and the Institut Polytechnique de Paris; its activities occupy over 600 people, scientists and research and innovation support staff, including 44 different nationalities.
Context
PEPR NumPEx & KOKTAILS
The transition to Exascale computing architectures requires a renewal of programming paradigms to efficiently leverage accelerators (GPUs, TPUs, and others). This transition presents a significant challenge for existing application codes, as a complete rewrite is often a massive undertaking.
The KOKTAILS project aims to address these challenges by proposing an advanced programming environment that facilitates the porting of codes to heterogeneous architectures while ensuring performance portability. The project develops a sovereign software stack tailored for GPU-based Exascale supercomputing. It addresses the critical challenges of software portability and performance optimization across diverse hardware architectures, ensuring seamless adaptation of scientific applications to future computing infrastructures. By integrating and enhancing existing open‑source frameworks, KOKTAILS will provide a robust middleware layer enabling French and European applications to fully exploit Exascale resources while reducing dependence on foreign software ecosystems. This project aligns fully with the PEPR NumPEx strategy, closely interacting with Exa‑SofT's work on software tool evolution and Exa‑DI to ensure integration into application demonstrators.
By ensuring the sustainability of software developments and facilitating their adoption by a wide range of applications, KOKTAILS will directly contribute to France's digital sovereignty and to scientific and technological excellence in HPC.
Internationally, the United States has significantly invested in Exascale software development through initiatives like the Exascale Computing Project (ECP), which has focused on co‑design efforts between hardware, software, and applications. Kokkos, an open‑source C++ parallel programming model, has emerged as a leading solution for portable performance across heterogeneous architectures and is widely adopted in worldwide supercomputing centers. Europe has made progress in HPC software development through programs like EuroHPC and PEPR NumPEx, and needs to ensure that a production‑ready software stack is ready for Exascale architectures that will be deployed in member states. Although the Kokkos ecosystem is mature, it lacks several key aspects to fully address the needs of the European computing communities. Porting legacy codes with complex data structures remain a significant challenge and although Kokkos is well‑suited for GPUs, its use relies on advanced meta‑programming, making its adoption challenging for some scientists.
Assignment
Unstructured and high‑dimensional meshes pose challenges for GPU optimization due to irregular memory access, load imbalance, and inefficient parallelism. Techniques like Reverse Cuthill‑McKee (RCM) reordering, optimal loop ordering, and hierarchical memory use aim to improve performance. Adaptive mesh partitioning based on connectivity strength also helps reduce load imbalance in domain decomposition. However, these strategies depend heavily on mesh topology, numerical methods, and hardware, so no one‑size‑fits‑all solution exists. Profiling and adaptive tuning are essential to find optimal configurations. Libraries like GMlib, OP2, and TNL offer support for unstructured meshes on GPUs but lack tools for selecting the best optimization strategies.
Future work should focus on auto‑tuning frameworks integrated with portability layers like Kokkos to provide scalable, efficient solutions for Exascale computing.
The KOKTAILS project will address these limitations by:
- Extending Kokkos with enhanced support for European architectures, ensuring its applicability in the French and European HPC landscape.
- Improving data structures in the Kokkos ecosystem to support specific meshes required in key French and European applications.
- Improving automatic code translation and transformation tools, to facilitate the migration of legacy scientific codes to modern GPU‑optimized frameworks such as the Kokkos ecosystem.
- Addressing challenges in Python‑Kokkos interoperability, enabling domain‑specific scientists to leverage Kokkos through a Python interface and enabling also a seamless integration of Python codes and AI models into C++ HPC codes for efficient execution on heterogeneous architectures.
Main activities
Efficient mesh management is crucial for many scientific applications. We propose to develop optimized Kokkos data structures for high‑dimensional or unstructured meshes. These data structures aim to reduce computational costs by leveraging optimized memory management for modern GPU‑based architectures. The innovation lies in designing mesh data structures that are both portable and adaptable to the specific constraints of Exascale architectures, ensuring scalability and optimal efficiency.
- Extend Kokkos views to support high‑dimensional data structures such as 6D/7D for applications in plasma physics, quantum simulations, turbulence modeling, preserving performance portability. Key efforts include native support for 6D/7D views with optimized memory layout and indexing for GPUs, improved memory access for efficiency across architectures, validation through benchmarks and demonstrators.
- Develop a flexible API for optimizing unstructured mesh algorithms on GPUs. It will support both static and dynamic strategies, including mesh reordering (RCM, Morton, Hilbert) for better cache locality, loop restructuring for optimized data access, hierarchical parallelism using shared memory and registers, load balancing via connectivity‑aware mesh partitioning, and race‑condition management through partition coloring and atomics. The API will allow switching between strategies based on code‑specific patterns to maximize GPU efficiency.
- Create a Kokkos‑based library for unstructured mesh processing. It will offer predefined mesh structures (e.g., edge shells, ball of points), parallel execution schemes for vectorized operations and efficient memory use, and multi‑architecture support (AMD, Intel, NVIDIA) via Kokkos back‑ends (CUDA, HIP, SYCL, OpenMP). The library, building on work from Exa‑DI, will provide a scalable, portable solution for scientific code adaptation to GPU‑based Exascale systems.
Skills
Strong scientific programming skills, particularly in modern C++.
Experience in parallel computing, including one or more of the following models: MPI, OpenMP, CUDA, HIP, SYCL, or accelerated scientific libraries like Kokkos and RAJA.
Knowledge of modern HPC architectures, including GPU systems (NVIDIA, AMD), many‑core architectures and complex memory hierarchies. Performance optimization and portability.
Understanding of numerical methods for PDEs (Finite Volumes, Finite Elements, implicit solvers) and their efficient implementation on parallel architectures.
Benefits package
- Partial reimbursement of public transport costs
- Leave: 7 weeks of annual leave + 10 extra days off due to RTT + possibility of exceptional leave (sick children, moving home, etc.)
- Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Access to vocational training
- Social security coverage
Remuneration
Remuneration is in regards to professional experiences.
- Theme/Domain: Distributed and High Performance Computing – Scientific computing (BAP E)
EEO Statement
As part of its diversity policy, all Inria positions are accessible to people with disabilities.
#J-18808-Ljbffr