An implementation of tensor product patch smoothers on GPUs

In this paper, a task-scheduling approach to efficiently calculating sparse symmetric matrix-vector products and designed to run on graphics processing units (GPUs) is presented. The main premise is that, for many sparse symmetric matrices occurring in common applications, it is possible to obtain s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Cui, Cu (VerfasserIn) , Große-Bley, Paul (VerfasserIn) , Kanschat, Guido (VerfasserIn) , Strzodka, Robert (VerfasserIn)
Dokumenttyp: Article (Journal)
Sprache:Englisch
Veröffentlicht: 2025
In: SIAM journal on scientific computing
Year: 2025, Jahrgang: 47, Heft: 2, Pages: B280-B307
ISSN:1095-7197
DOI:10.1137/24M1642706
Online-Zugang:Verlag, lizenzpflichtig, Volltext: https://doi.org/10.1137/24M1642706
Verlag, lizenzpflichtig, Volltext: https://epubs.siam.org/doi/10.1137/24M1642706
Volltext
Verfasserangaben:Cu Cui, Paul Grosse-Bley, Guido Kanschat, and Robert Strzodka

MARC

LEADER 00000naa a2200000 c 4500
001 1932956034
003 DE-627
005 20250811150500.0
007 cr uuu---uuuuu
008 250811s2025 xx |||||o 00| ||eng c
024 7 |a 10.1137/24M1642706  |2 doi 
035 |a (DE-627)1932956034 
035 |a (DE-599)KXP1932956034 
040 |a DE-627  |b ger  |c DE-627  |e rda 
041 |a eng 
084 |a 27  |2 sdnb 
100 1 |a Cui, Cu  |e VerfasserIn  |0 (DE-588)1373554339  |0 (DE-627)1932956395  |4 aut 
245 1 3 |a An implementation of tensor product patch smoothers on GPUs  |c Cu Cui, Paul Grosse-Bley, Guido Kanschat, and Robert Strzodka 
264 1 |c 2025 
300 |b Illustrationen 
300 |a 28 
336 |a Text  |b txt  |2 rdacontent 
337 |a Computermedien  |b c  |2 rdamedia 
338 |a Online-Ressource  |b cr  |2 rdacarrier 
500 |a Gesehen am 11.08.2025 
520 |a In this paper, a task-scheduling approach to efficiently calculating sparse symmetric matrix-vector products and designed to run on graphics processing units (GPUs) is presented. The main premise is that, for many sparse symmetric matrices occurring in common applications, it is possible to obtain significant reductions in memory usage and improvements in performance when the matrix is prepared in certain ways prior to computation. The preprocessing proposed in this paper employs task scheduling to overcome the difficulties that have suppressed the development of methods taking advantage of the symmetry of sparse matrices. The performance of the proposed task-scheduling method is verified using a Kepler (Tesla K40c) graphics accelerator, and is compared to the performance of cuSPARSE library functions on a GPU and to functions from the Intel MKL on central processing units (CPUs) executed in the parallel mode. The obtained results indicate that the proposed approach for sparse symmetric matrix-vector products results in up to a 40% reduction in memory usage, as compared to nonsymmetric matrix storage formats, while retaining good throughput. Compared to cuSPARSE and Intel MKL functions for sparse symmetric matrices, the proposed TSMV approach allowed us to achieve a significant speedup (of over one order of magnitude). 
700 1 |a Große-Bley, Paul  |e VerfasserIn  |0 (DE-588)1373554649  |0 (DE-627)1932956832  |4 aut 
700 1 |a Kanschat, Guido  |e VerfasserIn  |0 (DE-588)102535334X  |0 (DE-627)72215612X  |0 (DE-576)175755949  |4 aut 
700 1 |a Strzodka, Robert  |d 1973-  |e VerfasserIn  |0 (DE-588)122745264  |0 (DE-627)487567145  |0 (DE-576)293403473  |4 aut 
773 0 8 |i Enthalten in  |a Society for Industrial and Applied Mathematics  |t SIAM journal on scientific computing  |d Philadelphia, Pa. : SIAM, 1993  |g 47(2025), 2, Seite B280-B307  |h Online-Ressource  |w (DE-627)266885292  |w (DE-600)1468391-X  |w (DE-576)078589967  |x 1095-7197  |7 nnas 
773 1 8 |g volume:47  |g year:2025  |g number:2  |g pages:B280-B307  |g extent:28  |a An implementation of tensor product patch smoothers on GPUs 
856 4 0 |u https://doi.org/10.1137/24M1642706  |x Verlag  |x Resolving-System  |z lizenzpflichtig  |3 Volltext 
856 4 0 |u https://epubs.siam.org/doi/10.1137/24M1642706  |x Verlag  |z lizenzpflichtig  |3 Volltext 
951 |a AR 
992 |a 20250811 
993 |a Article 
994 |a 2025 
998 |g 122745264  |a Strzodka, Robert  |m 122745264:Strzodka, Robert  |d 700000  |d 720000  |e 700000PS122745264  |e 720000PS122745264  |k 0/700000/  |k 1/700000/720000/  |p 4  |y j 
998 |g 102535334X  |a Kanschat, Guido  |m 102535334X:Kanschat, Guido  |d 700000  |d 708000  |e 700000PK102535334X  |e 708000PK102535334X  |k 0/700000/  |k 1/700000/708000/  |p 3 
998 |g 1373554649  |a Große-Bley, Paul  |m 1373554649:Große-Bley, Paul  |d 700000  |d 720000  |e 700000PG1373554649  |e 720000PG1373554649  |k 0/700000/  |k 1/700000/720000/  |p 2 
998 |g 1373554339  |a Cui, Cu  |m 1373554339:Cui, Cu  |d 700000  |d 708000  |e 700000PC1373554339  |e 708000PC1373554339  |k 0/700000/  |k 1/700000/708000/  |p 1  |x j 
999 |a KXP-PPN1932956034  |e 4756356362 
BIB |a Y 
SER |a journal 
JSO |a {"physDesc":[{"extent":"28 S.","noteIll":"Illustrationen"}],"person":[{"given":"Cu","family":"Cui","role":"aut","display":"Cui, Cu"},{"family":"Große-Bley","given":"Paul","display":"Große-Bley, Paul","role":"aut"},{"role":"aut","display":"Kanschat, Guido","family":"Kanschat","given":"Guido"},{"display":"Strzodka, Robert","role":"aut","given":"Robert","family":"Strzodka"}],"title":[{"title_sort":"implementation of tensor product patch smoothers on GPUs","title":"An implementation of tensor product patch smoothers on GPUs"}],"relHost":[{"disp":"Society for Industrial and Applied MathematicsSIAM journal on scientific computing","language":["eng"],"type":{"media":"Online-Ressource","bibl":"periodical"},"titleAlt":[{"title":"Journal on scientific and statistical computing"},{"title":"Journal on scientific computing"}],"part":{"volume":"47","year":"2025","extent":"28","text":"47(2025), 2, Seite B280-B307","pages":"B280-B307","issue":"2"},"corporate":[{"role":"aut","display":"Society for Industrial and Applied Mathematics"}],"title":[{"title":"SIAM journal on scientific computing","title_sort":"SIAM journal on scientific computing"}],"pubHistory":["14.1993 -"],"id":{"zdb":["1468391-X"],"issn":["1095-7197"],"eki":["266885292"]},"recId":"266885292","name":{"displayForm":["Society for Industrial and Applied Mathematics"]},"physDesc":[{"extent":"Online-Ressource"}],"origin":[{"publisher":"SIAM","dateIssuedDisp":"1993-","dateIssuedKey":"1993","publisherPlace":"Philadelphia, Pa."}],"note":["Gesehen am 02.07.2021"]}],"origin":[{"dateIssuedKey":"2025","dateIssuedDisp":"2025"}],"note":["Gesehen am 11.08.2025"],"id":{"doi":["10.1137/24M1642706"],"eki":["1932956034"]},"language":["eng"],"name":{"displayForm":["Cu Cui, Paul Grosse-Bley, Guido Kanschat, and Robert Strzodka"]},"type":{"media":"Online-Ressource","bibl":"article-journal"},"recId":"1932956034"} 
SRT |a CUICUGROSSIMPLEMENTA2025