BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20240116T191659Z
LOCATION:710
DTSTART;TZID=America/Denver:20231112T120600
DTEND;TZID=America/Denver:20231112T123000
UID:submissions.supercomputing.org_SC23_sess419_ws_prot101@linklings.com
SUMMARY:Extra-Deep:  Automated Empirical Performance Modeling for Distribu
 ted Deep Learning
DESCRIPTION:Workshop\n\nMarcus Ritter and Felix Wolf (Technical University
  of Darmstadt)\n\nWith the rapidly increasing size and complexity of DNNs,
  equally sophisticated methods are needed to train them efficiently, inclu
 ding distributed training and various model/hybrid parallelism approaches.
  Even though developers heavily rely on state-of-the-art frameworks such a
 s PyTorch and TensorFlow, these provide little insight into an application
 's training behavior at scale, leading to latent performance bottlenecks a
 nd inefficient training configurations. We propose Extra-Deep, an automate
 d empirical performance modeling approach for distributed deep learning. W
 e leverage the created models to analyze a training task's performance, sc
 alability, efficiency, and cost. Using an efficient sampling strategy that
  reduces the profiling time for the required empirical measurements by, on
  average, about 94.9%, we can identify cost-effective training configurati
 ons even for large-scale applications. We evaluated our approach on three 
 parallelization strategies, with four DNN models and five datasets. The re
 sults show that Extra-Deep has an average prediction accuracy of 93.6% whe
 n compared to empirical results.\n\nTag: Performance Measurement, Modeling
 , and Tools, Programming Frameworks and System Software\n\nRegistration Ca
 tegory: Workshop Reg Pass\n\nSession Chairs: David Boehme (Lawrence Liverm
 ore National Laboratory (LLNL)); Anthony Danalis (University of Tennessee)
 ; and Josef Weidendorfer (Leibniz Supercomputing Centre, Technical Univers
 ity of Munich)
END:VEVENT
END:VCALENDAR
