BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20240116T191657Z
LOCATION:DEF Concourse
DTSTART;TZID=America/Denver:20231114T100000
DTEND;TZID=America/Denver:20231114T170000
UID:submissions.supercomputing.org_SC23_sess289_spostg116@linklings.com
SUMMARY:Scaling Infrastructure to Support Multi-Trillion Parameter LLM Tra
 ining
DESCRIPTION:ACM Student Research Competition: Graduate Poster, ACM Student
  Research Competition: Undergraduate Poster, Posters\n\nMikhail Isaev (Geo
 rgia Institute of Technology)\n\nThis poster discusses efficient system de
 signs for Large Language Model (LLM) scaling to up to 128 trillion paramet
 ers. We use a comprehensive analytical performance model to analyze how su
 ch models could be trained on current systems while maintaining 75% Model 
 FLOPS Utilization (MFU). We first show how tensor offloading alone can be 
 used to dramatically increase the size of trainable LLMs. We analyze perfo
 rmance bottlenecks when scaling on systems up to 16,384 GPUs and with mode
 ls up to 128T parameters. Our findings suggest that current H100 GPUs with
  80 GiB of HBM enabled with 512 GiB of tensor offloading capacity allows s
 caling to 11T-parameter LLMs; and getting to 128T parameters requires 120 
 GiB of HBM and 2 TiB of offloading memory, yielding 75%+ MFU, which is unc
 ommon even when training much smaller LLMs today.\n\nRegistration Category
 : Tech Program Reg Pass, Exhibits Reg Pass
END:VEVENT
END:VCALENDAR
