Using balanced data placement to address I/O contention in production environments
Designed for capacity and capability, HPC I/O systems are inherently complex and shared among multiple, concurrent jobs competing for resources. Lack of centralized coordination and control often render the end-to-end I/O paths vulnerable to load imbalance and contention. With the emergence of data-...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Chapter/Article Conference Paper |
| Language: | English |
| Published: |
2016
|
| In: |
2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)
Year: 2016, Pages: 9-17 |
| DOI: | 10.1109/SBAC-PAD.2016.10 |
| Subjects: | |
| Online Access: | Resolving-System, Volltext: http://dx.doi.org/10.1109/SBAC-PAD.2016.10 Verlag, Volltext: https://ieeexplore.ieee.org/document/7789318/ |
| Author Notes: | Sarah Neuwirth, Feiyi Wang, Sarp Oral, Sudharshan Vazhkudai, James H. Rogers and Ulrich Bruening |
| Summary: | Designed for capacity and capability, HPC I/O systems are inherently complex and shared among multiple, concurrent jobs competing for resources. Lack of centralized coordination and control often render the end-to-end I/O paths vulnerable to load imbalance and contention. With the emergence of data-intensive HPC applications, storage systems are further contended for performance and scalability. This paper proposes to unify two key approaches to tackle the imbalanced use of I/O resources and to achieve an end-to-end I/O performance improvement in the most transparent way. First, it utilizes a topology-aware, Balanced Placement I/O method (BPIO) for mitigating resource contention. Second, it takes advantage of the platform-neutral ADIOS middleware, which provides a flexible I/O mechanism for scientific applications. By integrating BPIO with ADIOS, referred to as Aequilibro, we obtain an end-to-end and per job I/O performance improvement for ADIOS-enabled HPC applications without requiring any code changes. Aequilibro can be applied to almost any HPC platform and is mostly suitable for systems that lack a centralized file system resource manager. We demonstrate the effectiveness of our integration on the Titan system at the Oak Ridge National Laboratory. Our experiments with a synthetic benchmark and real-world HPC workload show that, even in a noisy production environment, Aequilibro can improve large-scale application performance significantly. |
|---|---|
| Item Description: | Gesehen am 23.05.2018 |
| Physical Description: | Online Resource |
| ISBN: | 9781509061082 |
| DOI: | 10.1109/SBAC-PAD.2016.10 |