BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20240116T191701Z
LOCATION:405
DTSTART;TZID=America/Denver:20231113T083000
DTEND;TZID=America/Denver:20231113T170000
UID:submissions.supercomputing.org_SC23_sess242_tut140@linklings.com
SUMMARY:Efficient Distributed GPU Programming for Exascale
DESCRIPTION:Tutorial\n\nAndreas Herten (Forschungszentrum Jülich; Jülich S
 upercomputing Centre (JSC), Institute for Advanced Simulation); Simon Garc
 ia de Gonzalo (Sandia National Laboratories); Jiri Kraus and Markus Hrywni
 ak (NVIDIA Corporation); and Chelsea Maria John (Forschungzentrum Jülich, 
 Jülich Supercomputing Centre)\n\nOver the past decade, GPUs became ubiquit
 ous in HPC installations around the world, delivering the majority of perf
 ormance of some of the largest supercomputers (e.g. Summit, Sierra, JUWELS
  Booster). This trend continues in the recently deployed and upcoming Pre-
 Exascale and Exascale systems (JUPITER, LUMI, Leonardo; El Capitan, Fronti
 er, Perlmutter): GPUs are chosen as the core computing devices to enter th
 is next era of HPC.  To take advantage of future GPU-accelerated systems w
 ith tens of thousands of devices, application developers need to have the 
 proper skills and tools to understand, manage, and optimize distributed GP
 U applications.\n\nIn this tutorial, participants will learn techniques to
  efficiently program large-scale multi-GPU systems. While programming mult
 iple GPUs with MPI is explained in detail, also advanced tuning techniques
  and complementing programming models like NCCL and NVSHMEM are presented.
  Tools for analysis are shown and used to motivate and implement performan
 ce optimizations. The tutorial teaches fundamental concepts that apply to 
 GPU-accelerated systems in general, taking the NVIDIA platform as an examp
 le. It is a combination of lectures and hands-on exercises, using one of E
 urope’s fastest supercomputers, JUWELS Booster, for interactive learning a
 nd discovery.\n\nTag: Accelerators, Exascale, Heterogeneous Computing, Per
 formance Optimization\n\nRegistration Category: Tutorial Reg Pass
END:VEVENT
END:VCALENDAR
