Presentation
An End-to-End HPC Framework for Dynamic Power Objectives
SessionThe 1st International Workshop on the Environmental Sustainability of High-Performance Software
DescriptionHigh-Performance Computing (HPC) centers demand a lot of power, and continue to grow through the exascale era. This work establishes the need for a multi-tiered, feedback-driven power management framework to follow dynamic power objectives while maximizing job performance, highlighting the need to respond to external factors (e.g., power constraints), and internal factors (e.g., performance variation). We present a practical implementation of this framework on a real-world cluster in addition to conducting simulations for larger data centers. We accurately track a moving power target for demand response while reacting to incomplete or inaccurate prior knowledge about job power and performance properties. We demonstrate that online performance feedback from a job runtime enables a cluster power management policy to recover most of the performance degradation introduced by job-type misclassification.