

An Analysis of Change Point Detection in High Performance Computing
DescriptionAs high-performance computing approaches the exascale era, the analysis of the vast amount of monitoring data generated by supercomputers has become increasingly challenging for data analysts. The detection of change points, which plays a critical role in anomaly detection, performance optimization, and root cause analysis of problems and failures, has grown beyond human capacity for manual review. To address this issue, our focus lies in developing an effective model capable of identifying anomalous behavior, and to achieve this, we introduce the concept of an online adaptive sampling algorithm. By evaluating the model's performance across various use cases, we conduct tests on a complex datasets to detect change points. Overall, we observe that the model successfully captures key features of normal behavior, and we believe it opens promising avenues for further research, particularly in assisting with various tasks related to anomaly detection and performance optimization in high-performance computing environments.
Event Type
TimeMonday, 13 November 20232:09pm - 2:12pm MST
State of the Practice
Registration Categories