Presentation
Who's Winning the Performance Portability Race on GPU Platforms?
DescriptionEnsuring high productivity in scientific software development necessitates developing and maintaining a single codebase that can run efficiently on a range of accelerator-based supercomputing platforms. This requires the use of performance portability layers such as OpenMP, RAJA, Kokkos and SYCL for developing the compute kernels. In this talk, I will present the results of a comprehensive study of a range of proxy applications implemented in the major programming models suitable for GPU-based platforms. We collect and analyze performance results across NVIDIA and AMD GPU hardware currently deployed in leadership-class computing facilities using a representative set of scientific codes and several programming models -- CUDA, HIP, Kokkos, RAJA, OpenMP, OpenACC, and SYCL. Based on the specific characteristics of applications tested, we discuss recommendations to developers on how to choose the right programming model for their code. These results provide a comprehensive evaluation of the extent to which each programming model for heterogeneous systems provides true performance portability in real-world usage.