Presentation
Modeling Data Locality of Sparse Matrix-Vector Multiplication on the A64FX
DescriptionOne of the novel features of the Fujitsu A64FX CPU is the sector cache. This feature enables hardware-supported partitioning of the L1 and L2 caches and allows the programmer control of which partition is used to place data in. This paper performs an in-depth study of how to apply the sector cache to a frequently used sparse matrix-vector multiplication (SpMV) kernel. A performance model based on reuse analysis is used to better understand situations where the sector cache leads to improved reuse and to predict the cache behavior. The model correctly predicts the number of L2 cache misses within 2–3 % for sequential and parallel SpMV with 48 threads using a collection of 490 sparse matrices. Further experiments show the effect of various sector cache configurations on performance. A median speedup of about 1.05× is achieved, whereas the maximum speedup is about 1.6×.