Presentation
Toward a Peer-to-Peer Data Distribution Layer for Efficient and Collaborative Resource Optimization of Distributed Dataflow Applications
DescriptionOptimizing the underlying cluster configurations of distributed data processing frameworks can be complex and often requires performance modeling techniques due to the multitude of performance-affecting factors. While these approaches may not always be applicable due to the need for substantial training data, at the same time, data analytics jobs oftentimes share common characteristics, such as algorithm implementations, which suggest the potential for collaborative performance modeling. Current collaborative approaches, however, mainly assume a centralized storage infrastructure, which comes with its own potential drawbacks, i.e., with regard to data privacy, storage costs, or system maintenance. We envision a peer-to-peer-based data distribution layer, facilitating data sovereignty, failure resilience, and means of ad-hoc collaboration, thereby fostering cross-context resource optimization approaches for big data analytics.