Butterfly factorization for vision transformers on multi-IPU systems

Recent advances in machine learning have led to increasingly large and complex models, placing significant demands on computation and memory. Techniques such as Butterfly factorization have emerged to reduce model parameters and memory footprints while preserving accuracy. Specialized hardware accel...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Shekofteh, S. Kazem (VerfasserIn) , Bogacz, Daniel (VerfasserIn) , Alles, Christian (VerfasserIn) , Fröning, Holger (VerfasserIn)
Dokumenttyp: Article (Journal)
Sprache:Englisch
Veröffentlicht: March 2026
In: Parallel computing
Year: 2026, Jahrgang: 127, Pages: 1-10
ISSN:1872-7336
DOI:10.1016/j.parco.2025.103165
Online-Zugang:Verlag, kostenfrei, Volltext: https://doi.org/10.1016/j.parco.2025.103165
Verlag, kostenfrei, Volltext: https://www.sciencedirect.com/science/article/pii/S0167819125000419
Volltext
Verfasserangaben:S.-Kazem Shekofteh, Daniel Bogacz, Christian Alles, Holger Fröning

MARC

LEADER 00000naa a2200000 c 4500
001 1951202406
003 DE-627
005 20260205105628.0
007 cr uuu---uuuuu
008 260205s2026 xx |||||o 00| ||eng c
024 7 |a 10.1016/j.parco.2025.103165  |2 doi 
035 |a (DE-627)1951202406 
035 |a (DE-599)KXP1951202406 
040 |a DE-627  |b ger  |c DE-627  |e rda 
041 |a eng 
084 |a 27  |2 sdnb 
100 1 |a Shekofteh, S. Kazem  |d 1986-  |e VerfasserIn  |0 (DE-588)1189154854  |0 (DE-627)1667883631  |4 aut 
245 1 0 |a Butterfly factorization for vision transformers on multi-IPU systems  |c S.-Kazem Shekofteh, Daniel Bogacz, Christian Alles, Holger Fröning 
264 1 |c March 2026 
300 |b Illustrationen 
300 |a 10 
336 |a Text  |b txt  |2 rdacontent 
337 |a Computermedien  |b c  |2 rdamedia 
338 |a Online-Ressource  |b cr  |2 rdacarrier 
500 |a Online verfügbar: 27. November 2025, Artikelversion: 10. Dezember 2025 
500 |a Gesehen am 05.02.2026 
520 |a Recent advances in machine learning have led to increasingly large and complex models, placing significant demands on computation and memory. Techniques such as Butterfly factorization have emerged to reduce model parameters and memory footprints while preserving accuracy. Specialized hardware accelerators, such as Graphcore’s Intelligence Processing Units (IPUs), are designed to address these challenges through massive parallelism and efficient on-chip memory utilization. In this paper, we extend our analysis of Butterfly structures for efficient utilization on single and multiple IPUs, comparing their performance with GPUs. These structures drastically reduce the number of parameters and memory footprint while preserving model accuracy. Experimental results on the Graphcore GC200 IPU chip, compared with an NVIDIA A30 GPU, demonstrate a 98.5% compression ratio, with speedups of 1.6× and 1.3× for Butterfly and Pixelated Butterfly structures, respectively. Extending our evaluation to Vision Transformer (ViT) models, we compare Multi-GPU and Multi-IPU systems on the M2000 machine: Multi-GPU reaches a maximum accuracy of 84.51% with a training time of 401.44 min, whereas Multi-IPU attains a higher maximum accuracy of 88.92% with a training time of 694.03 min. These results demonstrate that Butterfly factorization enables substantial compression of ViT layers (up to 97.17%) while improving model accuracy. The findings highlight the promise of IPU machines as a suitable platform for large-scale machine learning model training, especially when coupled with sparsification methods like Butterfly factorization, thanks to their efficient support for model parallelism. 
650 4 |a Butterfly factorization 
650 4 |a Intelligence processing units 
650 4 |a Machine learning 
650 4 |a Model compression 
650 4 |a Vision transformer 
700 1 |a Bogacz, Daniel  |e VerfasserIn  |4 aut 
700 1 |a Alles, Christian  |e VerfasserIn  |4 aut 
700 1 |a Fröning, Holger  |d 1976-  |e VerfasserIn  |0 (DE-588)133209466  |0 (DE-627)538678658  |0 (DE-576)299696189  |4 aut 
773 0 8 |i Enthalten in  |t Parallel computing  |d Amsterdam [u.a.] : North-Holland, Elsevier Science, 1984  |g 127(2026) vom: März, Artikel-ID 103165, Seite 1-10  |h Online-Ressource  |w (DE-627)265784115  |w (DE-600)1466340-5  |w (DE-576)074890999  |x 1872-7336  |7 nnas  |a Butterfly factorization for vision transformers on multi-IPU systems 
773 1 8 |g volume:127  |g year:2026  |g month:03  |g elocationid:103165  |g pages:1-10  |g extent:10  |a Butterfly factorization for vision transformers on multi-IPU systems 
856 4 0 |u https://doi.org/10.1016/j.parco.2025.103165  |x Verlag  |x Resolving-System  |z kostenfrei  |3 Volltext  |7 0 
856 4 0 |u https://www.sciencedirect.com/science/article/pii/S0167819125000419  |x Verlag  |z kostenfrei  |3 Volltext  |7 0 
951 |a AR 
992 |a 20260205 
993 |a Article 
994 |a 2026 
998 |g 133209466  |a Fröning, Holger  |m 133209466:Fröning, Holger  |d 700000  |d 720000  |e 700000PF133209466  |e 720000PF133209466  |k 0/700000/  |k 1/700000/720000/  |p 4  |y j 
998 |g 1189154854  |a Shekofteh, S. Kazem  |m 1189154854:Shekofteh, S. Kazem  |d 700000  |d 720000  |e 700000PS1189154854  |e 720000PS1189154854  |k 0/700000/  |k 1/700000/720000/  |p 1  |x j 
999 |a KXP-PPN1951202406  |e 4877019286 
BIB |a Y 
SER |a journal 
JSO |a {"origin":[{"dateIssuedKey":"2026","dateIssuedDisp":"March 2026"}],"relHost":[{"pubHistory":["1.1984 - 40.2014; Vol. 41.2015 -"],"note":["Gesehen am 31.05.23"],"language":["eng"],"part":{"volume":"127","year":"2026","pages":"1-10","extent":"10","text":"127(2026) vom: März, Artikel-ID 103165, Seite 1-10"},"disp":"Butterfly factorization for vision transformers on multi-IPU systemsParallel computing","type":{"bibl":"periodical","media":"Online-Ressource"},"recId":"265784115","origin":[{"publisher":"North-Holland, Elsevier Science ; North-Holland","dateIssuedDisp":"1984-","dateIssuedKey":"1984","publisherPlace":"Amsterdam [u.a.] ; Amsterdam [u.a.]"}],"title":[{"title":"Parallel computing","title_sort":"Parallel computing"}],"id":{"issn":["1872-7336"],"eki":["265784115"],"zdb":["1466340-5"]},"physDesc":[{"extent":"Online-Ressource"}]}],"id":{"doi":["10.1016/j.parco.2025.103165"],"eki":["1951202406"]},"physDesc":[{"extent":"10 S.","noteIll":"Illustrationen"}],"type":{"media":"Online-Ressource","bibl":"article-journal"},"title":[{"title":"Butterfly factorization for vision transformers on multi-IPU systems","title_sort":"Butterfly factorization for vision transformers on multi-IPU systems"}],"note":["Online verfügbar: 27. November 2025, Artikelversion: 10. Dezember 2025","Gesehen am 05.02.2026"],"recId":"1951202406","person":[{"given":"S. Kazem","display":"Shekofteh, S. Kazem","family":"Shekofteh","role":"aut"},{"display":"Bogacz, Daniel","given":"Daniel","role":"aut","family":"Bogacz"},{"given":"Christian","display":"Alles, Christian","family":"Alles","role":"aut"},{"display":"Fröning, Holger","given":"Holger","family":"Fröning","role":"aut"}],"name":{"displayForm":["S.-Kazem Shekofteh, Daniel Bogacz, Christian Alles, Holger Fröning"]},"language":["eng"]} 
SRT |a SHEKOFTEHSBUTTERFLYF2026