Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energy

Graphic Processing Units (GPUs) are widely used in high performance computing, due to their high computational power and high performance per Watt. However, one of the main bottlenecks of GPU-accelerated cluster computing is the data transfer between distributed GPUs. This not only affects performan...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Oden, Lena (VerfasserIn) , Klenk, Benjamin (VerfasserIn) , Fröning, Holger (VerfasserIn)
Dokumenttyp: Article (Journal)
Sprache:Englisch
Veröffentlicht: 29 March 2016
In: Parallel computing
Year: 2016, Jahrgang: 57, Pages: 125-134
ISSN:1872-7336
DOI:10.1016/j.parco.2016.02.005
Online-Zugang:Verlag, Volltext: http://dx.doi.org/10.1016/j.parco.2016.02.005
Verlag, Volltext: http://www.sciencedirect.com/science/article/pii/S0167819116300011
Volltext
Verfasserangaben:Lena Oden, Benjamin Klenk, Holger Fröning

MARC

LEADER 00000caa a2200000 c 4500
001 1560390727
003 DE-627
005 20220813185924.0
007 cr uuu---uuuuu
008 170703s2016 xx |||||o 00| ||eng c
024 7 |a 10.1016/j.parco.2016.02.005  |2 doi 
035 |a (DE-627)1560390727 
035 |a (DE-576)490390722 
035 |a (DE-599)BSZ490390722 
035 |a (OCoLC)1340976501 
040 |a DE-627  |b ger  |c DE-627  |e rda 
041 |a eng 
084 |a 28  |2 sdnb 
100 1 |a Oden, Lena  |e VerfasserIn  |0 (DE-588)1069864110  |0 (DE-627)822870797  |0 (DE-576)429555911  |4 aut 
245 1 0 |a Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energy  |c Lena Oden, Benjamin Klenk, Holger Fröning 
264 1 |c 29 March 2016 
300 |a 10 
336 |a Text  |b txt  |2 rdacontent 
337 |a Computermedien  |b c  |2 rdamedia 
338 |a Online-Ressource  |b cr  |2 rdacarrier 
500 |a Gesehen am 03.07.2017 
520 |a Graphic Processing Units (GPUs) are widely used in high performance computing, due to their high computational power and high performance per Watt. However, one of the main bottlenecks of GPU-accelerated cluster computing is the data transfer between distributed GPUs. This not only affects performance, but also power consumption. The most common way to utilize a GPU cluster is a hybrid model, in which the GPU is used to accelerate the computation, while the CPU is responsible for the communication. This approach always requires a dedicated CPU thread, which consumes additional CPU cycles and therefore increases the power consumption of the complete application. In recent work we have shown that the GPU is able to control the communication independently of the CPU. However, there are several problems with GPU-controlled communication. The main problem is intra-GPU synchronization, since GPU blocks are non-preemptive. Therefore, the use of communication requests within a GPU can easily result in a deadlock. In this work we show how dynamic parallelism solves this problem. GPU-controlled communication in combination with dynamic parallelism allows keeping the control flow of multi-GPU applications on the GPU and bypassing the CPU completely. Using other in-kernel synchronization methods results in massive performance losses, due to the forced serialization of the GPU thread blocks. Although the performance of applications using GPU-controlled communication is still slightly worse than the performance of hybrid applications, we will show that performance per Watt increases by up to 10% while still using commodity hardware. 
650 4 |a Communication 
650 4 |a Data transfer 
650 4 |a Dynamic parallelism 
650 4 |a Energy efficiency 
650 4 |a GPUs 
650 4 |a Infiniband 
700 1 |a Klenk, Benjamin  |d 1989-  |e VerfasserIn  |0 (DE-588)1073185729  |0 (DE-627)828640025  |0 (DE-576)434708488  |4 aut 
700 1 |a Fröning, Holger  |d 1976-  |e VerfasserIn  |0 (DE-588)133209466  |0 (DE-627)538678658  |0 (DE-576)299696189  |4 aut 
773 0 8 |i Enthalten in  |t Parallel computing  |d Amsterdam [u.a.] : North-Holland, Elsevier Science, 1984  |g 57(2016), Seite 125-134  |h Online-Ressource  |w (DE-627)265784115  |w (DE-600)1466340-5  |w (DE-576)074890999  |x 1872-7336  |7 nnas  |a Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energy 
773 1 8 |g volume:57  |g year:2016  |g pages:125-134  |g extent:10  |a Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energy 
856 4 0 |u http://dx.doi.org/10.1016/j.parco.2016.02.005  |x Verlag  |x Resolving-System  |3 Volltext 
856 4 0 |u http://www.sciencedirect.com/science/article/pii/S0167819116300011  |x Verlag  |3 Volltext 
951 |a AR 
992 |a 20170703 
993 |a Article 
994 |a 2016 
998 |g 133209466  |a Fröning, Holger  |m 133209466:Fröning, Holger  |d 700000  |d 720000  |e 700000PF133209466  |e 720000PF133209466  |k 0/700000/  |k 1/700000/720000/  |p 3  |y j 
998 |g 1073185729  |a Klenk, Benjamin  |m 1073185729:Klenk, Benjamin  |d 700000  |d 720000  |e 700000PK1073185729  |e 720000PK1073185729  |k 0/700000/  |k 1/700000/720000/  |p 2 
998 |g 1069864110  |a Oden, Lena  |m 1069864110:Oden, Lena  |p 1  |x j 
999 |a KXP-PPN1560390727  |e 2973050529 
BIB |a Y 
SER |a journal 
JSO |a {"relHost":[{"origin":[{"publisher":"North-Holland, Elsevier Science ; North-Holland","dateIssuedKey":"1984","dateIssuedDisp":"1984-","publisherPlace":"Amsterdam [u.a.] ; Amsterdam [u.a.]"}],"id":{"issn":["1872-7336"],"eki":["265784115"],"zdb":["1466340-5"]},"note":["Gesehen am 31.05.23"],"language":["eng"],"physDesc":[{"extent":"Online-Ressource"}],"title":[{"title_sort":"Parallel computing","title":"Parallel computing"}],"recId":"265784115","type":{"bibl":"periodical","media":"Online-Ressource"},"pubHistory":["1.1984 - 40.2014; Vol. 41.2015 -"],"part":{"pages":"125-134","volume":"57","text":"57(2016), Seite 125-134","year":"2016","extent":"10"},"disp":"Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energyParallel computing"}],"type":{"bibl":"article-journal","media":"Online-Ressource"},"person":[{"given":"Lena","role":"aut","family":"Oden","display":"Oden, Lena"},{"family":"Klenk","display":"Klenk, Benjamin","role":"aut","given":"Benjamin"},{"given":"Holger","role":"aut","display":"Fröning, Holger","family":"Fröning"}],"recId":"1560390727","id":{"eki":["1560390727"],"doi":["10.1016/j.parco.2016.02.005"]},"origin":[{"dateIssuedKey":"2016","dateIssuedDisp":"29 March 2016"}],"name":{"displayForm":["Lena Oden, Benjamin Klenk, Holger Fröning"]},"note":["Gesehen am 03.07.2017"],"physDesc":[{"extent":"10 S."}],"language":["eng"],"title":[{"title":"Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energy","title_sort":"Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energy"}]} 
SRT |a ODENLENAKLANALYZINGG2920