Automatic and transparent resource contention mitigation for improving large-scale parallel file system performance

Proportional to the scale increases in HPC systems, many scientific applications are becoming increasingly data intensive, and parallel I/O has become one of the dominant factors impacting the large-scale HPC application performance. On a typical large-scale HPC system, we have observed that the lac...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Neuwirth, Sarah (VerfasserIn) , Brüning, Ulrich (VerfasserIn)
Dokumenttyp: Kapitel/Artikel Konferenzschrift
Sprache:Englisch
Veröffentlicht: 31 May 2018
In: 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS 2017)
Year: 2018, Pages: 604-613
DOI:10.1109/ICPADS.2017.00084
Online-Zugang:Verlag, Volltext: http://dx.doi.org/10.1109/ICPADS.2017.00084
Verlag, Volltext: https://ieeexplore.ieee.org/document/8368413/
Volltext
Verfasserangaben:Sarah Neuwirth, Feiyi Wang, Sarp Oral and Ulrich Bruening
Beschreibung
Zusammenfassung:Proportional to the scale increases in HPC systems, many scientific applications are becoming increasingly data intensive, and parallel I/O has become one of the dominant factors impacting the large-scale HPC application performance. On a typical large-scale HPC system, we have observed that the lack of a global workload coordination coupled with the shared nature of storage systems cause load imbalance and resource contention over the end-to-end I/O paths resulting in severe performance degradation. I/O load imbalance on HPC systems is generally a self-inflicted wound and mostly occurs between the I/O paths and resources consumed by each individual job.
Beschreibung:Online Resource
ISBN:9781538621295
DOI:10.1109/ICPADS.2017.00084