Energy-efficient multigrid smoothers and grid transfer operators on multi-core and GPU clusters

We investigate time and energy to solution for the CPU- and GPU-based execution of the compute intensive smoother and grid transfer operators in a geometric multigrid linear solver. We use a hybrid parallel implementation for both shared and distributed memory multi-core host systems comprising CUDA...

Full description

Saved in:
Bibliographic Details
Main Authors: Wlotzka, Martin (Author) , Heuveline, Vincent (Author)
Format: Article (Journal)
Language:English
Published: 2017
In: Journal of parallel and distributed computing
Year: 2016, Volume: 100, Pages: 181-192
DOI:10.1016/j.jpdc.2016.05.006
Online Access:Verlag, Volltext: http://dx.doi.org/10.1016/j.jpdc.2016.05.006
Verlag, Volltext: http://www.sciencedirect.com/science/article/pii/S0743731516300363
Get full text
Author Notes:Martin Wlotzka, Vincent Heuveline
Description
Summary:We investigate time and energy to solution for the CPU- and GPU-based execution of the compute intensive smoother and grid transfer operators in a geometric multigrid linear solver. We use a hybrid parallel implementation for both shared and distributed memory multi-core host systems comprising CUDA-capable devices. Our numerical experiments are designed to assess the effect of combining an MPI-parallel multigrid framework with OpenMP host threads or CUDA accelerators instead of MPI-only CPU computations for various parallel setups. We present runtime and energy measurements from a quad-CPU test system equipped with two GPUs. We find that using an accelerated asynchronous smoother can yield substantial savings of time and energy to solution over using a host-only Jacobi smoother in small and medium sized host systems with one or two multi-core CPUs. The acceleration of the grid transfer operators also yields a benefit, yet smaller than the benefit from the smoother. For large host systems a hybrid MPI-OpenMP parallelization turns out to be most beneficial with respect to energy consumption, although it is not the fastest option.
Item Description:Gesehen am 01.10.2018
Physical Description:Online Resource
DOI:10.1016/j.jpdc.2016.05.006