

An Autonomous Execution Model for GPUs: When CPUs Take a Back Seat
DescriptionIn conventional multi-GPU configurations, the host manages execution, kernel launches, communication, and synchronization, incurring unnecessary overhead. To mitigate this, we present a CPU-free model that delegates control to the devices themselves, especially benefiting communication-intensive applications. Utilizing techniques such as persistent kernels, specialized thread blocks, and device-initiated communication, we create autonomous multi-GPU code that drastically reduces communication overhead. Our approach is demonstrated with popular solvers, including 2D/3D Jacobian stencil and Conjugate Gradient (CG). We are currently developing its compiler technology, applying the model to a broader set of applications and its debugging/profiling tools.
Event Type
TimeMonday, 13 November 20232:30pm - 3pm MST
Large Scale Systems
Middleware and System Software
Programming Frameworks and System Software
Registration Categories