Presentation
Lightning Talk: Datastates for Debugging – Using Productive Checkpointing for Improved Debugging
DescriptionDebuggers are powerful and productive tools to understand and fix correctness issues in parallel programs. Tools like `rr` and `gdb` even introduce capabilities for “reverse time” debugging allowing the user to step backward through the states of the program. However, these tools were not designed to be scalable, have high overhead, or are not usable by distributed MPI applications. In this talk, we discuss ongoing extensions to MPIGDB – a scalable open-source debugger for MPI programs – that integrate it with an application checkpointing framework to allow more scalable reverse-time debugging for MPI applications as a use case of productive checkpointing. Specifically, we propose: (1) Extending the debugger to allow “fast-forwarding” and “rewinding” to checkpoints interactively. (2) Allowing the user to “diff” states between checkpoints to understand how the program evolved. (3) Leveraging the knowledge of an application-level checkpointing framework to reduce the storage and memory overheads of reverse time debugging. In this talk, we will discuss the kinds of extensions needed by checkpointing libraries to support this use case, the performance implications of the different approaches to reverse time debugging, and showcase some debugging challenges that are simplified by leveraging this kind of capability.