Multilevel interior penalty methods on GPUs

We present a matrix-free multigrid method for high-order Discontinuous Galerkin (DG) finite element methods with GPU acceleration. A performance analysis is conducted, comparing various data and compute layouts. Smoother implementations are optimized through localization and fast diagonalization tec...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Cui, Cu (VerfasserIn) , Kanschat, Guido (VerfasserIn)
Dokumenttyp:	Article (Journal)
Sprache:	Englisch
Veröffentlicht:	September 2025
In:	ACM transactions on mathematical software Year: 2025, Jahrgang: 51, Heft: 3, Pages: 1-27
ISSN:	1557-7295
DOI:	10.1145/3765616
Online-Zugang:	Verlag, kostenfrei, Volltext: https://doi.org/10.1145/3765616 Verlag, kostenfrei, Volltext: https://dl.acm.org/doi/10.1145/3765616
Verfasserangaben:	Cu Cui, Guido Kanschat

MARC


LEADER	00000caa a2200000 c 4500
001	1950218376
003	DE-627
005	20260127105827.0
007	cr uuu---uuuuu
008	260126s2025 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1145/3765616 \|2 doi
035			\|a (DE-627)1950218376
035			\|a (DE-599)KXP1950218376
040			\|a DE-627 \|b ger \|c DE-627 \|e rda
041			\|a eng
084			\|a 28 \|2 sdnb
100	1		\|a Cui, Cu \|e VerfasserIn \|0 (DE-588)1373554339 \|0 (DE-627)1932956395 \|4 aut
245	1	0	\|a Multilevel interior penalty methods on GPUs \|c Cu Cui, Guido Kanschat
264		1	\|c September 2025
300			\|b Diagramme
300			\|a 27
336			\|a Text \|b txt \|2 rdacontent
337			\|a Computermedien \|b c \|2 rdamedia
338			\|a Online-Ressource \|b cr \|2 rdacarrier
500			\|a Gesehen am 26.01.2026
520			\|a We present a matrix-free multigrid method for high-order Discontinuous Galerkin (DG) finite element methods with GPU acceleration. A performance analysis is conducted, comparing various data and compute layouts. Smoother implementations are optimized through localization and fast diagonalization techniques. Leveraging conflict-free access patterns in shared memory, arithmetic throughput of up to 40% of the peak performance on NVIDIA A100 GPUs are achieved. Experimental results affirm the effectiveness of mixed-precision approaches and Message Passing Interface (MPI) parallelization in accelerating algorithms. Furthermore, an assessment of solver efficiency and robustness is provided across both two and three dimensions, with applications to Poisson problems.
700	1		\|a Kanschat, Guido \|e VerfasserIn \|0 (DE-588)102535334X \|0 (DE-627)72215612X \|0 (DE-576)175755949 \|4 aut
773	0	8	\|i Enthalten in \|a Association for Computing Machinery \|t ACM transactions on mathematical software \|d New York, NY : ACM, 1975 \|g 51(2025), 3 vom: Sept., Artikel-ID 19, Seite 1-27 \|h Online-Ressource \|w (DE-627)320454134 \|w (DE-600)2006421-4 \|w (DE-576)09088986X \|x 1557-7295 \|7 nnas
773	1	8	\|g volume:51 \|g year:2025 \|g number:3 \|g month:09 \|g elocationid:19 \|g pages:1-27 \|g extent:27 \|a Multilevel interior penalty methods on GPUs
856	4	0	\|u https://doi.org/10.1145/3765616 \|x Verlag \|x Resolving-System \|z kostenfrei \|3 Volltext \|7 0
856	4	0	\|u https://dl.acm.org/doi/10.1145/3765616 \|x Verlag \|z kostenfrei \|3 Volltext \|7 0
951			\|a AR
992			\|a 20260126
993			\|a Article
994			\|a 2025
998			\|g 102535334X \|a Kanschat, Guido \|m 102535334X:Kanschat, Guido \|d 700000 \|d 708000 \|e 700000PK102535334X \|e 708000PK102535334X \|k 0/700000/ \|k 1/700000/708000/ \|p 2 \|y j
998			\|g 1373554339 \|a Cui, Cu \|m 1373554339:Cui, Cu \|d 700000 \|d 708000 \|e 700000PC1373554339 \|e 708000PC1373554339 \|k 0/700000/ \|k 1/700000/708000/ \|p 1 \|x j
999			\|a KXP-PPN1950218376 \|e 4860819632
BIB			\|a Y
SER			\|a journal
JSO			\|a {"relHost":[{"note":["Gesehen am 16.06.20"],"origin":[{"dateIssuedKey":"1975","publisherPlace":"New York, NY","dateIssuedDisp":"1975-","publisher":"ACM"}],"physDesc":[{"extent":"Online-Ressource"}],"recId":"320454134","pubHistory":["1.1975 -"],"id":{"zdb":["2006421-4"],"eki":["320454134"],"issn":["1557-7295"]},"title":[{"title_sort":"ACM transactions on mathematical software","title":"ACM transactions on mathematical software","subtitle":"a publication of the Association for Computing Machinery"}],"part":{"year":"2025","volume":"51","extent":"27","text":"51(2025), 3 vom: Sept., Artikel-ID 19, Seite 1-27","pages":"1-27","issue":"3"},"corporate":[{"display":"Association for Computing Machinery","role":"aut"}],"disp":"Association for Computing MachineryACM transactions on mathematical software","type":{"bibl":"periodical","media":"Online-Ressource"},"language":["eng"],"titleAlt":[{"title":"TOMS"},{"title":"ACM TOMS"},{"title":"Transactions on mathematical software"}]}],"person":[{"family":"Cui","given":"Cu","display":"Cui, Cu","role":"aut"},{"family":"Kanschat","given":"Guido","role":"aut","display":"Kanschat, Guido"}],"title":[{"title":"Multilevel interior penalty methods on GPUs","title_sort":"Multilevel interior penalty methods on GPUs"}],"physDesc":[{"noteIll":"Diagramme","extent":"27 S."}],"note":["Gesehen am 26.01.2026"],"origin":[{"dateIssuedDisp":"September 2025","dateIssuedKey":"2025"}],"id":{"eki":["1950218376"],"doi":["10.1145/3765616"]},"recId":"1950218376","name":{"displayForm":["Cu Cui, Guido Kanschat"]},"type":{"media":"Online-Ressource","bibl":"article-journal"},"language":["eng"]}
SRT			\|a CUICUKANSCMULTILEVEL2025

Multilevel interior penalty methods on GPUs

MARC

Ähnliche Einträge