CSX: an extended compression format for spmv on shared memory systems
The Sparse Matrix-Vector multiplication (SpMV) kernel scales poorly on shared memory
systems with multiple processing units due to the streaming nature of its data access pattern …
systems with multiple processing units due to the streaming nature of its data access pattern …
Optimizing sparse matrix-vector multiplication using index and value compression
Previous research work has identified memory bandwidth as the main bottleneck of the
ubiquitous Sparse Matrix-Vector Multiplication kernel. To attack this problem, we aim at …
ubiquitous Sparse Matrix-Vector Multiplication kernel. To attack this problem, we aim at …
Performance evaluation of the sparse matrix-vector multiplication on modern architectures
In this paper, we revisit the performance issues of the widely used sparse matrix-vector
multiplication (SpMxV) kernel on modern microarchitectures. Previous scientific work reports …
multiplication (SpMxV) kernel on modern microarchitectures. Previous scientific work reports …
Fast conjugate gradients with multiple GPUs
The limiting factor for efficiency of sparse linear solvers is the memory bandwidth. In this
work, we describe a fast Conjugate Gradient solver for unstructured problems, which runs on …
work, we describe a fast Conjugate Gradient solver for unstructured problems, which runs on …
Understanding the performance of sparse matrix-vector multiplication
In this paper we revisit the performance issues of the widely used sparse matrix-vector
multiplication (SpMxV) kernel on modern microarchitectures. Previous scientific work reports …
multiplication (SpMxV) kernel on modern microarchitectures. Previous scientific work reports …
Performance analysis and optimization of sparse matrix-vector multiplication on modern multi-and many-core processors
This paper presents a low-overhead optimizer for the ubiquitous sparse matrix-vector
multiplication (SpMV) kernel. Architectural diversity among different processors together with …
multiplication (SpMV) kernel. Architectural diversity among different processors together with …
Learning sparse matrix row permutations for efficient spmm on gpu architectures
Achieving peak performance on sparse operations is challenging. The distribution of the non-
zero elements and underlying hardware platform affect the execution efficiency. Given the …
zero elements and underlying hardware platform affect the execution efficiency. Given the …
Bringing Order to Sparsity: A Sparse Matrix Reordering Study on Multicore CPUs
Many real-world computations involve sparse data structures in the form of sparse matrices.
A common strategy for optimizing sparse matrix operations is to reorder a matrix to improve …
A common strategy for optimizing sparse matrix operations is to reorder a matrix to improve …
Structured matrices and their application in neural networks: A survey
Modern neural network architectures are becoming larger and deeper, with increasing
computational resources needed for training and inference. One approach toward handling …
computational resources needed for training and inference. One approach toward handling …
Improving the performance of multithreaded sparse matrix-vector multiplication using index and value compression
The sparse matrix-vector multiplication kernel exhibits limited potential for taking advantage
of modern shared memory architectures due to its large memory bandwidth requirements …
of modern shared memory architectures due to its large memory bandwidth requirements …