cCUDA: effective co-scheduling of concurrent kernels on GPUs

While GPUs are meantime omnipresent for many scientific and technical computations, they still continue to evolve as processors. An important recent feature is the ability to execute multiple kernels concurrently via queue streams. However, experiments show that different parameters including the be...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Shekofteh, S. Kazem (VerfasserIn) , Noori, Hamid Reza (VerfasserIn) , Naghibzadeh, Mahmoud (VerfasserIn) , Fröning, Holger (VerfasserIn) , Yazdi, Hadi Sadoghi (VerfasserIn)
Dokumenttyp:	Article (Journal)
Sprache:	Englisch
Veröffentlicht:	2020
In:	IEEE transactions on parallel and distributed systems Year: 2020, Jahrgang: 31, Heft: 4, Pages: 766-778
ISSN:	1558-2183
DOI:	10.1109/TPDS.2019.2944602
Online-Zugang:	Verlag, lizenzpflichtig: https://doi.org/10.1109/TPDS.2019.2944602
Verfasserangaben:	S.-Kazem Shekofteh, Hamid Noori, Mahmoud Naghibzadeh, Holger Fröning, Hadi Sadoghi Yazdi

MARC


LEADER	00000caa a2200000 c 4500
001	169526844X
003	DE-627
005	20220818045515.0
007	cr uuu---uuuuu
008	200421s2020 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TPDS.2019.2944602 \|2 doi
035			\|a (DE-627)169526844X
035			\|a (DE-599)KXP169526844X
035			\|a (OCoLC)1341315979
040			\|a DE-627 \|b ger \|c DE-627 \|e rda
041			\|a eng
084			\|a 28 \|2 sdnb
100	1		\|a Shekofteh, S. Kazem \|d 1986- \|e VerfasserIn \|0 (DE-588)1189154854 \|0 (DE-627)1667883631 \|4 aut
245	1	0	\|a cCUDA \|b effective co-scheduling of concurrent kernels on GPUs \|c S.-Kazem Shekofteh, Hamid Noori, Mahmoud Naghibzadeh, Holger Fröning, Hadi Sadoghi Yazdi
264		1	\|c 2020
300			\|a 13
336			\|a Text \|b txt \|2 rdacontent
337			\|a Computermedien \|b c \|2 rdamedia
338			\|a Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date of Publication: 30 September 2019
500			\|a Gesehen am 21.04.2020
520			\|a While GPUs are meantime omnipresent for many scientific and technical computations, they still continue to evolve as processors. An important recent feature is the ability to execute multiple kernels concurrently via queue streams. However, experiments show that different parameters including the behavior of kernels, the order of kernel launches and other execution configurations, e.g., the number of concurrent thread blocks, may result in different execution time for concurrent kernel execution. Since kernels may have different resource requirements, they can be classified into different classes, which are traditionally assumed as either memory-bound or compute-bound. However, a kernel may belong to the different classes on different hardware according to the hardware resources. In this paper, the definition of kernel mix intensity is introduced. Based on this, a scheduling framework called concurrent CUDA (cCUDA) is proposed to co-schedule the concurrent kernels more efficiently. It first profiles and ranks kernels with different execution behaviors and then takes the kernel resource requirements into account to partition thread blocks of different kernels and overlap them to better utilize the GPU resources. Experimental results on real hardware demonstrate performance improvement in terms of execution time of up to 1.86x, and an average speedup of 1.28x for a wide range of kernels. cCUDA is available at https://github.com/kshekofteh/cCUDA.
650		4	\|a Analytical models
650		4	\|a Benchmark testing
650		4	\|a concurrent kernel execution
650		4	\|a Graphics processing units
650		4	\|a Hardware
650		4	\|a Kernel
650		4	\|a resource management
650		4	\|a scheduling
650		4	\|a Scheduling
650		4	\|a stream
700	1		\|a Noori, Hamid Reza \|d 1982- \|e VerfasserIn \|0 (DE-588)136385877 \|0 (DE-627)58206872X \|0 (DE-576)300993609 \|4 aut
700	1		\|a Naghibzadeh, Mahmoud \|e VerfasserIn \|4 aut
700	1		\|a Fröning, Holger \|d 1976- \|e VerfasserIn \|0 (DE-588)133209466 \|0 (DE-627)538678658 \|0 (DE-576)299696189 \|4 aut
700	1		\|a Yazdi, Hadi Sadoghi \|e VerfasserIn \|4 aut
773	0	8	\|i Enthalten in \|a Institute of Electrical and Electronics Engineers \|t IEEE transactions on parallel and distributed systems \|d New York, NY : IEEE, 1990 \|g 31(2020), 4, Seite 766-778 \|h Online-Ressource \|w (DE-627)324490127 \|w (DE-600)2027774-X \|w (DE-576)094111006 \|x 1558-2183 \|7 nnas
773	1	8	\|g volume:31 \|g year:2020 \|g number:4 \|g pages:766-778 \|g extent:13 \|a cCUDA effective co-scheduling of concurrent kernels on GPUs
856	4	0	\|u https://doi.org/10.1109/TPDS.2019.2944602 \|x Verlag \|z lizenzpflichtig
951			\|a AR
992			\|a 20200421
993			\|a Article
994			\|a 2020
998			\|g 133209466 \|a Fröning, Holger \|m 133209466:Fröning, Holger \|d 700000 \|d 720000 \|e 700000PF133209466 \|e 720000PF133209466 \|k 0/700000/ \|k 1/700000/720000/ \|p 4
999			\|a KXP-PPN169526844X \|e 3627648188
BIB			\|a Y
SER			\|a journal
JSO			\|a {"recId":"169526844X","person":[{"family":"Shekofteh","role":"aut","given":"S. Kazem","display":"Shekofteh, S. Kazem"},{"role":"aut","family":"Noori","given":"Hamid Reza","display":"Noori, Hamid Reza"},{"role":"aut","family":"Naghibzadeh","display":"Naghibzadeh, Mahmoud","given":"Mahmoud"},{"given":"Holger","display":"Fröning, Holger","family":"Fröning","role":"aut"},{"display":"Yazdi, Hadi Sadoghi","given":"Hadi Sadoghi","family":"Yazdi","role":"aut"}],"note":["Date of Publication: 30 September 2019","Gesehen am 21.04.2020"],"language":["eng"],"name":{"displayForm":["S.-Kazem Shekofteh, Hamid Noori, Mahmoud Naghibzadeh, Holger Fröning, Hadi Sadoghi Yazdi"]},"relHost":[{"titleAlt":[{"title":"Transactions on parallel and distributed systems"},{"title":"TPDS"}],"disp":"Institute of Electrical and Electronics EngineersIEEE transactions on parallel and distributed systems","type":{"bibl":"periodical","media":"Online-Ressource"},"corporate":[{"display":"Institute of Electrical and Electronics Engineers","role":"aut"}],"note":["Gesehen am 07.03.19"],"pubHistory":["1.1990 -"],"part":{"text":"31(2020), 4, Seite 766-778","issue":"4","extent":"13","pages":"766-778","volume":"31","year":"2020"},"language":["eng"],"origin":[{"publisherPlace":"New York, NY","dateIssuedKey":"1990","dateIssuedDisp":"1990-","publisher":"IEEE"}],"physDesc":[{"extent":"Online-Ressource"}],"id":{"zdb":["2027774-X"],"eki":["324490127"],"issn":["1558-2183"]},"title":[{"subtitle":"TPDS","title":"IEEE transactions on parallel and distributed systems","title_sort":"IEEE transactions on parallel and distributed systems"}],"recId":"324490127"}],"origin":[{"dateIssuedKey":"2020","dateIssuedDisp":"2020"}],"type":{"bibl":"article-journal","media":"Online-Ressource"},"title":[{"subtitle":"effective co-scheduling of concurrent kernels on GPUs","title":"cCUDA","title_sort":"cCUDA"}],"id":{"doi":["10.1109/TPDS.2019.2944602"],"eki":["169526844X"]},"physDesc":[{"extent":"13 S."}]}
SRT			\|a SHEKOFTEHSCCUDA2020

cCUDA: effective co-scheduling of concurrent kernels on GPUs

MARC

Ähnliche Einträge