Using balanced data placement to address I/O contention in production environments

Designed for capacity and capability, HPC I/O systems are inherently complex and shared among multiple, concurrent jobs competing for resources. Lack of centralized coordination and control often render the end-to-end I/O paths vulnerable to load imbalance and contention. With the emergence of data-...

Full description

Saved in:
Bibliographic Details
Main Authors: Neuwirth, Sarah (Author) , Brüning, Ulrich (Author)
Format: Chapter/Article Conference Paper
Language:English
Published: 2016
In: 2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)
Year: 2016, Pages: 9-17
DOI:10.1109/SBAC-PAD.2016.10
Subjects:
Online Access:Resolving-System, Volltext: http://dx.doi.org/10.1109/SBAC-PAD.2016.10
Verlag, Volltext: https://ieeexplore.ieee.org/document/7789318/
Get full text
Author Notes:Sarah Neuwirth, Feiyi Wang, Sarp Oral, Sudharshan Vazhkudai, James H. Rogers and Ulrich Bruening
Description
Summary:Designed for capacity and capability, HPC I/O systems are inherently complex and shared among multiple, concurrent jobs competing for resources. Lack of centralized coordination and control often render the end-to-end I/O paths vulnerable to load imbalance and contention. With the emergence of data-intensive HPC applications, storage systems are further contended for performance and scalability. This paper proposes to unify two key approaches to tackle the imbalanced use of I/O resources and to achieve an end-to-end I/O performance improvement in the most transparent way. First, it utilizes a topology-aware, Balanced Placement I/O method (BPIO) for mitigating resource contention. Second, it takes advantage of the platform-neutral ADIOS middleware, which provides a flexible I/O mechanism for scientific applications. By integrating BPIO with ADIOS, referred to as Aequilibro, we obtain an end-to-end and per job I/O performance improvement for ADIOS-enabled HPC applications without requiring any code changes. Aequilibro can be applied to almost any HPC platform and is mostly suitable for systems that lack a centralized file system resource manager. We demonstrate the effectiveness of our integration on the Titan system at the Oak Ridge National Laboratory. Our experiments with a synthetic benchmark and real-world HPC workload show that, even in a noisy production environment, Aequilibro can improve large-scale application performance significantly.
Item Description:Gesehen am 23.05.2018
Physical Description:Online Resource
ISBN:9781509061082
DOI:10.1109/SBAC-PAD.2016.10