Semi-automatic rule-based domain terminology and software feature-relevant information extraction from natural language user manuals: an approach and evaluation at Roche Diagnostics GmbH

Mature software systems comprise a vast number of heterogeneous system capabilities which are usually requested by different groups of stakeholders and which evolve over time. Software features describe and bundle low level capabilities logically on an abstract level and thus provide a structured an...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Quirchmayr, Thomas (VerfasserIn) , Paech, Barbara (VerfasserIn)
Dokumenttyp: Article (Journal)
Sprache:Englisch
Veröffentlicht: 19 February 2018
In: Empirical software engineering
Year: 2018, Jahrgang: 23, Heft: 6, Pages: 3630-3683
ISSN:1573-7616
DOI:10.1007/s10664-018-9597-6
Online-Zugang:Verlag, Volltext: http://dx.doi.org/10.1007/s10664-018-9597-6
Verlag, Volltext: https://link.springer.com/article/10.1007/s10664-018-9597-6
Volltext
Verfasserangaben:Thomas Quirchmayr, Barbara Paech, Roland Kohl, Hannes Karey, Gunar Kasdepke
Beschreibung
Zusammenfassung:Mature software systems comprise a vast number of heterogeneous system capabilities which are usually requested by different groups of stakeholders and which evolve over time. Software features describe and bundle low level capabilities logically on an abstract level and thus provide a structured and comprehensive overview of the entire capabilities of a software system. Software features are often not explicitly managed. Quite the contrary, feature-relevant information is often spread across several software engineering artifacts (e.g., user manual, issue tracking systems). It requires huge manual effort to identify and extract feature-relevant information from these artifacts in order to make feature knowledge explicit. In this paper we present a two-step-approach to extract feature-relevant information from a user manual: First we semi-automatically extract a domain terminology from a natural language user manual based on linguistic patterns. Then, we apply natural language processing techniques based on the extracted domain terminology and structural sentence information. Our approach is able to extract atomic feature-relevant information with an F1-score of at least 92.00%. We describe the implementation of the approach as well as evaluations based on example sections of a user manual taken from industry.
Beschreibung:Published online: 19 February 2018
Gesehen am 15.04.2019
Beschreibung:Online Resource
ISSN:1573-7616
DOI:10.1007/s10664-018-9597-6