Presentation
What Operations Can Be Performed Directly on Compressed Arrays and with What Error?
SessionThe 9th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD-9)
DescriptionIn response to the rapidly escalating data movement-related costs of computing with large matrices and tensors, several lossy compression methods have been developed that help reduce the volume of data moved. Unfortunately, all these methods require the data to be decompressed before operating on the data. In this work, we develop a lossy compressor called PyBlaz that supports a dozen operations directly on compressed data while also offering good compression ratios. PyBlaz is based on the PyTorch framework, and thus can be run on CPUs or GPUs without any code changes. We evaluate the efficacy of PyBlaz on data sets originating in three non-trivial applications: shallow-water simulation, MRI segmentation, and plutonium fission. Our results demonstrate that PyBlaz’s compressed-domain operations achieve good scalability while incurring errors well within acceptable limits. To our knowledge, this is the first such lossy compressor that supports compressed-domain operations in the realm of handling scientific datasets.