Presentation
PM100: A Job Power Consumption Dataset of a Large-Scale Production HPC System
SessionThe 1st International Workshop on the Environmental Sustainability of High-Performance Software
DescriptionThe power requirements of modern High-Performance Computing (HPC) systems pose environmental and financial challenges, given their carbon emissions and strain power grids. Optimizing power consumption together with system performance has thus become crucial. As jobs running on a system contribute to the whole system's power usage, predicting their power requirements before execution would allow forecasting the overall power consumption and perform techniques like power capping. Such predictive studies need quality data, which is limited due to the inherent complexity of collecting structured data in a production system. This paper aims to fill the lack of resources for job power prediction and provide (i) a methodology to create a job power consumption dataset from workload manager data and node power metrics logs, and (ii) a novel dataset comprising around 230K jobs and their corresponding power consumption values. The dataset is derived from M100, a holistic dataset extracted from a production supercomputer.