Scalable communication architecture for network-attached accelerators

On the road to Exascale computing, novel communication architectures are required to overcome the limitations of host-centric accelerators. Typically, accelerator devices require a local host CPU to configure and operate them. This limits the number of accelerators per host system. Network-attached...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Neuwirth, Sarah (VerfasserIn) , Frey, Dirk (VerfasserIn) , Brüning, Ulrich (VerfasserIn)
Dokumenttyp: Kapitel/Artikel Konferenzschrift
Sprache:Englisch
Veröffentlicht: 09 March 2015
In: 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA 2015)
Year: 2015, Pages: 627-638
DOI:10.1109/HPCA.2015.7056068
Schlagworte:
Online-Zugang:Resolving-System, Volltext: http://dx.doi.org/10.1109/HPCA.2015.7056068
Verlag, Volltext: https://ieeexplore.ieee.org/document/7056068/
Volltext
Verfasserangaben:Sarah Neuwirth, Dirk Frey, Mondrian Nuessle and Ulrich Bruening
Beschreibung
Zusammenfassung:On the road to Exascale computing, novel communication architectures are required to overcome the limitations of host-centric accelerators. Typically, accelerator devices require a local host CPU to configure and operate them. This limits the number of accelerators per host system. Network-attached accelerators are a new architectural approach for scaling the number of accelerators and host CPUs independently. In this paper, the communication architecture for network-attached accelerators is described which enables remote initialization and control of the accelerator devices. Furthermore, an operative prototype implementation is presented. The prototype accelerator node consists of an Intel Xeon Phi coprocessor and an EXTOLL NIC. The EXTOLL interconnect provides new features to enable accelerator-to-accelerator direct communication without a local host. Workloads can be dynamically assigned to CPUs and accelerators at run-time in an N to M ratio. The latency, bandwidth, and performance of the low-level implementation and MPI communication layer are presented. The LAMMPS molecular dynamics simulator is used to evaluate the communication architecture. The internode communication time is improved by up to 47%.
Beschreibung:Gesehen am 23.05.2018
Beschreibung:Online Resource
ISBN:9781479989317
DOI:10.1109/HPCA.2015.7056068