BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20240116T191702Z
LOCATION:303
DTSTART;TZID=America/Denver:20231113T133000
DTEND;TZID=America/Denver:20231113T170000
UID:submissions.supercomputing.org_SC23_sess234_tut137@linklings.com
SUMMARY:Scalable Big Data Processing on High Performance Computing Systems
DESCRIPTION:Tutorial\n\nDhabaleswar K. (DK) Panda, Aamir Shafi, and Jingha
 n Yao (Ohio State University)\n\nThere are several popular Big Data proces
 sing frameworks including Apache Spark and Dask. These frameworks are not 
 capable of exploiting high-speed and low-latency networks like InfiniBand,
  Omni-Path, Slingshot, and others.  In the High Performance Computing (HPC
 ) community, the Message Passing Interface (MPI) libraries are widely adop
 ted to tackle this issue by executing scientific and engineering applicati
 ons on parallel hardware connected via fast interconnect.\n\nThis tutorial
  introduces MPI4Spark and MPI4Dask that are enhanced Spark and Dask framew
 orks, respectively, and capable of utilizing MPI for communication in a pa
 rallel and distributed setting on HPC systems.  MPI4Spark can launch the S
 park ecosystem using MPI launchers to utilize MPI communication. It also m
 aintains isolation for application execution by forking new processes usin
 g Dynamic Process Management (DPM). MPI4Spark also provides portability an
 d performance benefits as it can utilize popular HPC interconnects.  MPI4D
 ask is an MPI-based custom Dask framework that is targeted for modern HPC 
 clusters built with CPU and NVIDIA GPUs.\n\nThis tutorial provides a detai
 led overview of the design, implementation, and evaluation of MPI4Spark an
 d MPI4Dask on state-of-the-art HPC systems. Later, we also cover writing, 
 running, and demonstrating user Big Data applications on HPC systems.\n\nT
 ag: Architecture and Networks, Data Movement and Memory, Message Passing\n
 \nRegistration Category: Tutorial Reg Pass
END:VEVENT
END:VCALENDAR
