Communication models for distributed Intel Xeon Phi coprocessors

The emergence of accelerator technology in current supercomputing systems is changing the landscape of supercom-puting architectures. Accelerators like GPGPUs and coprocessors are optimized for parallel computation while being more energy efficient. Their computational power per watt plays a crucial...

Full description

Saved in:
Bibliographic Details
Main Authors: Neuwirth, Sarah (Author) , Frey, Dirk (Author) , Brüning, Ulrich (Author)
Format: Chapter/Article Conference Paper
Language:English
Published: 2015
In: 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)
Year: 2015, Pages: 499-506
DOI:10.1109/ICPADS.2015.69
Subjects:
Online Access:Resolving-System, Volltext: http://dx.doi.org/10.1109/ICPADS.2015.69
Verlag, Volltext: https://ieeexplore.ieee.org/document/7384332/
Get full text
Author Notes:Sarah Neuwirth, Dirk Frey and Ulrich Bruening
Description
Summary:The emergence of accelerator technology in current supercomputing systems is changing the landscape of supercom-puting architectures. Accelerators like GPGPUs and coprocessors are optimized for parallel computation while being more energy efficient. Their computational power per watt plays a crucial role in developing exaflop systems. Today's accelerators come with some limitations. They require a local host to configure and operate them. In addition, the number of host CPUs and accelerators does not scale independently. Another problem is the unbalanced communication between distributed accelerators. New communication frameworks are developed to optimize the internode communication. In this paper, four communication models using the Intel Xeon Phi coprocessor technology are compared. The Intel Xeon Phi coprocessor is based on the Intel Many Integrated Cores technology. It is an attractive accelerator due to its embedded Linux operating system, up to 1 TFLOPS of performance on a single chip, and its x86 64 compatibility. DCFA-MPI, MVAPICH2-MIC, and HAM-Offload are compared against the communication architecture for network-attached accelerators (NAA). Each communication model optimizes a different layer of the MIC communication architecture. The NAA approach makes the accelerator device independent from a local host system. Furthermore, it enables the accelerator to source and sink network traffic. Workloads can be dynamically assigned during run-time in an N to M ratio between CPUs and accelerators. The latency, bandwidth, and performance of the MPI communication layer of a prototype implementation are evaluated.
Item Description:Published online: 18 January 2016
Gesehen am 23.05.2018
Physical Description:Online Resource
ISBN:9780769557854
DOI:10.1109/ICPADS.2015.69