What's wrong with my benchmark results?: Studying bad practices in JMH benchmarks

Microbenchmarking frameworks, such as Java's Microbenchmark Harness (JMH), allow developers to write fine-grained performance test suites at the method or statement level. However, due to the complexities of the Java Virtual Machine, developers often struggle with writing expressive JMH benchma...

Full description

Saved in:
Bibliographic Details
Main Authors: Damesceno Costa, Diego Elias (Author) , Bezemer, Cor-Paul (Author) , Leitner, Philipp (Author) , Andrzejak, Artur (Author)
Format: Article (Journal)
Language:English
Published: 16 July 2021
In: IEEE transactions on software engineering
Year: 2021, Volume: 47, Issue: 7, Pages: 1452-1467
ISSN:1939-3520
DOI:10.1109/TSE.2019.2925345
Online Access:Verlag, lizenzpflichtig, Volltext: https://doi.org/10.1109/TSE.2019.2925345
Verlag, lizenzpflichtig, Volltext: https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=DOISource&SrcApp=WOS&KeyAID=10.1109%2FTSE.2019.2925345&DestApp=DOI&SrcAppSID=F4YMMc9jPB5f9dX7ixC&SrcJTitle=IEEE+TRANSACTIONS+ON+SOFTWARE+ENGINEERING&DestDOIRegistrantName=Institute+of+Electrical+and+Electronics+Engineers
Get full text
Author Notes:Diego Costa, Cor-Paul Bezemer, Philipp Leitner, and Artur Andrzejak
Description
Summary:Microbenchmarking frameworks, such as Java's Microbenchmark Harness (JMH), allow developers to write fine-grained performance test suites at the method or statement level. However, due to the complexities of the Java Virtual Machine, developers often struggle with writing expressive JMH benchmarks which accurately represent the performance of such methods or statements. In this paper, we empirically study bad practices of JMH benchmarks. We present a tool that leverages static analysis to identify 5 bad JMH practices. Our empirical study of 123 open source Java-based systems shows that each of these 5 bad practices are prevalent in open source software. Further, we conduct several experiments to quantify the impact of each bad practice in multiple case studies, and find that bad practices often significantly impact the benchmark results. To validate our experimental results, we constructed seven patches that fix the identified bad practices for six of the studied open source projects, of which six were merged into the main branch of the project. In this paper, we show that developers struggle with accurate Java microbenchmarking, and provide several recommendations to developers of microbenchmarking frameworks on how to improve future versions of their framework.
Item Description:Gesehen am 08.09.2021
Physical Description:Online Resource
ISSN:1939-3520
DOI:10.1109/TSE.2019.2925345