Presentation
Unified Communication Optimization Strategies for Sparse Triangular Solver on CPU and GPU Clusters
SessionLinear Algebra I
DescriptionThis paper presents a unified framework for reducing communication costs of sparse triangular solvers (SpTRSV) on CPU and GPU clusters. The proposed framework builds upon a 3D communication-avoiding process layout that distributes a sparse triangular matrix into a 3D layout consisting of 2D grids. This work significantly reduces inter-process communication by replicating computation and using sparse allreduce operations across the 2D grids. This also allows for integration of a number of communication-optimized 2D SpTRSV algorithms including binary communication tree-based CPU algorithms and one-sided GPU communication (e.g., NVSHMEM)-based algorithms. With all these communication reduction schemes, the resulting SpTRSV exhibits significantly better scalability than existing works on leadership CPU and CPU clusters such as Cori, Perlmutter and Crusher.