Analysis and Performance Estimation of the Conjugate Gradient Method on multiple GPUs

Parallel Computing 38, 2012 (pdf)

In our paper we propose an efficient Block Compressed Sparse Row (BCSR) implementation for Sparse-Matrices processed on (multiple) GPUs using CUDA. Using the proposed layout we can perform a Sparse-Matrix Vector Multiplication (SPMV) efficiently on modern GPUs, which can accelerate the Conjugate Gradient method and other related Krylov subspace methods. In our paper we show that the performance of the SPMV operation is close to the limits of the used hardware. Since Krylov subspace methods consists of a number vector-vector and matrix-vector operations, the total performance of these methods is reflected by the individual performances of these operations. We take the Conjugate Gradient method as an example to approximate its performance given the size of the problem (dimension and sparsity of the matrix), the configuration of the sparse-matrix layout and the amount of used GPUs. Given this rough approximation, one can choose the number of GPUs to solve a particular problem.

Our software provides an easy-to-use interface for creating and using sparse-matrices on the GPU. The software implements a number of Krylov subspace methods which can be performed on a number of GPUs in parallel. Furthermore, a parallel CPU version is provided.

The software can be obtained using:

git clone https://code.google.com/p/palinso/

https://code.google.com/p/palinso/source/browse/