Adapting Bidirectional Encoder Representations from Transformers (BERT) to assess clinical semantic textual similarity: algorithm development and validation study

Background: Natural Language Understanding enables automatic extraction of relevant information from clinical text data, which are acquired every day in hospitals. In 2018, the language model Bidirectional Encoder Representations from Transformers (BERT) was introduced, generating new state-of-the-a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Kades, Klaus (VerfasserIn) , Sellner, Jan (VerfasserIn) , Köhler, Gregor (VerfasserIn) , Full, Peter M. (VerfasserIn) , Lai, T. Y. Emmy (VerfasserIn) , Kleesiek, Jens Philipp (VerfasserIn) , Maier-Hein, Klaus H. (VerfasserIn)
Dokumenttyp: Article (Journal)
Sprache:Englisch
Veröffentlicht: 3.2.2021
In: JMIR medical informatics
Year: 2021, Jahrgang: 9, Heft: 2, Pages: 1-13
ISSN:2291-9694
DOI:10.2196/22795
Online-Zugang:Verlag, lizenzpflichtig, Volltext: https://doi.org/10.2196/22795
Verlag, lizenzpflichtig, Volltext: https://medinform.jmir.org/2021/2/e22795
Volltext
Verfasserangaben:Klaus Kades, MSc; Jan Sellner, MSc; Gregor Koehler, MSc; Peter M Full, BSc; TY Emmy Lai, MSc; Jens Kleesiek, MD, PhD; Klaus H Maier-Hein, PhD

MARC

LEADER 00000caa a2200000 c 4500
001 1754996647
003 DE-627
005 20240723082609.0
007 cr uuu---uuuuu
008 210415s2021 xx |||||o 00| ||eng c
024 7 |a 10.2196/22795  |2 doi 
035 |a (DE-627)1754996647 
035 |a (DE-599)KXP1754996647 
035 |a (OCoLC)1341404699 
040 |a DE-627  |b ger  |c DE-627  |e rda 
041 |a eng 
084 |a 33  |2 sdnb 
100 1 |a Kades, Klaus  |d 1991-  |e VerfasserIn  |0 (DE-588)1237380650  |0 (DE-627)1764082338  |4 aut 
245 1 0 |a Adapting Bidirectional Encoder Representations from Transformers (BERT) to assess clinical semantic textual similarity  |b algorithm development and validation study  |c Klaus Kades, MSc; Jan Sellner, MSc; Gregor Koehler, MSc; Peter M Full, BSc; TY Emmy Lai, MSc; Jens Kleesiek, MD, PhD; Klaus H Maier-Hein, PhD 
264 1 |c 3.2.2021 
300 |a 13 
336 |a Text  |b txt  |2 rdacontent 
337 |a Computermedien  |b c  |2 rdamedia 
338 |a Online-Ressource  |b cr  |2 rdacarrier 
500 |a Gesehen am 21.07.2021 
520 |a Background: Natural Language Understanding enables automatic extraction of relevant information from clinical text data, which are acquired every day in hospitals. In 2018, the language model Bidirectional Encoder Representations from Transformers (BERT) was introduced, generating new state-of-the-art results on several downstream tasks. The National NLP Clinical Challenges (n2c2) is an initiative that strives to tackle such downstream tasks on domain-specific clinical data. In this paper, we present the results of our participation in the 2019 n2c2 and related work completed thereafter. - Objective: The objective of this study was to optimally leverage BERT for the task of assessing the semantic textual similarity of clinical text data. - Methods: We used BERT as an initial baseline and analyzed the results, which we used as a starting point to develop 3 different approaches where we (1) added additional, handcrafted sentence similarity features to the classifier token of BERT and combined the results with more features in multiple regression estimators, (2) incorporated a built-in ensembling method, M-Heads, into BERT by duplicating the regression head and applying an adapted training strategy to facilitate the focus of the heads on different input patterns of the medical sentences, and (3) developed a graph-based similarity approach for medications, which allows extrapolating similarities across known entities from the training set. The approaches were evaluated with the Pearson correlation coefficient between the predicted scores and ground truth of the official training and test dataset. - Results: We improved the performance of BERT on the test dataset from a Pearson correlation coefficient of 0.859 to 0.883 using a combination of the M-Heads method and the graph-based similarity approach. We also show differences between the test and training dataset and how the two datasets influenced the results. - Conclusions: We found that using a graph-based similarity approach has the potential to extrapolate domain specific knowledge to unseen sentences. We observed that it is easily possible to obtain deceptive results from the test dataset, especially when the distribution of the data samples is different between training and test datasets. 
700 1 |a Sellner, Jan  |d 1990-  |e VerfasserIn  |0 (DE-588)1333964773  |0 (DE-627)1892148137  |4 aut 
700 1 |a Köhler, Gregor  |e VerfasserIn  |0 (DE-588)1237383773  |0 (DE-627)1764084845  |4 aut 
700 1 |a Full, Peter M.  |e VerfasserIn  |0 (DE-588)1219695777  |0 (DE-627)1735698784  |4 aut 
700 1 |a Lai, T. Y. Emmy  |e VerfasserIn  |4 aut 
700 1 |a Kleesiek, Jens Philipp  |d 1977-  |e VerfasserIn  |0 (DE-588)132998076  |0 (DE-627)530080745  |0 (DE-576)299554465  |4 aut 
700 1 |a Maier-Hein, Klaus H.  |d 1980-  |e VerfasserIn  |0 (DE-588)1100551875  |0 (DE-627)85946461X  |0 (DE-576)333771222  |4 aut 
773 0 8 |i Enthalten in  |t JMIR medical informatics  |d Toronto : [Verlag nicht ermittelbar], 2013  |g 9(2021), 2, Artikel-ID e22795, Seite 1-13  |h Online-Ressource  |w (DE-627)802534929  |w (DE-600)2798261-0  |w (DE-576)416957234  |x 2291-9694  |7 nnas  |a Adapting Bidirectional Encoder Representations from Transformers (BERT) to assess clinical semantic textual similarity algorithm development and validation study 
773 1 8 |g volume:9  |g year:2021  |g number:2  |g elocationid:e22795  |g pages:1-13  |g extent:13  |a Adapting Bidirectional Encoder Representations from Transformers (BERT) to assess clinical semantic textual similarity algorithm development and validation study 
856 4 0 |u https://doi.org/10.2196/22795  |x Verlag  |x Resolving-System  |z lizenzpflichtig  |3 Volltext 
856 4 0 |u https://medinform.jmir.org/2021/2/e22795  |x Verlag  |z lizenzpflichtig  |3 Volltext 
951 |a AR 
992 |a 20210415 
993 |a Article 
994 |a 2021 
998 |g 1100551875  |a Maier-Hein, Klaus H.  |m 1100551875:Maier-Hein, Klaus H.  |d 50000  |d 910000  |d 911400  |e 50000PM1100551875  |e 910000PM1100551875  |e 911400PM1100551875  |k 0/50000/  |k 0/910000/  |k 1/910000/911400/  |p 7  |y j 
998 |g 132998076  |a Kleesiek, Jens Philipp  |m 132998076:Kleesiek, Jens Philipp  |d 50000  |e 50000PK132998076  |k 0/50000/  |p 6 
998 |g 1219695777  |a Full, Peter M.  |m 1219695777:Full, Peter M.  |d 50000  |e 50000PF1219695777  |k 0/50000/  |p 4 
998 |g 1237383773  |a Köhler, Gregor  |m 1237383773:Köhler, Gregor  |d 50000  |e 50000PK1237383773  |k 0/50000/  |p 3 
998 |g 1333964773  |a Sellner, Jan  |m 1333964773:Sellner, Jan  |d 110000  |e 110000PS1333964773  |k 0/110000/  |p 2 
998 |g 1237380650  |a Kades, Klaus  |m 1237380650:Kades, Klaus  |d 110000  |e 110000PK1237380650  |k 0/110000/  |p 1  |x j 
999 |a KXP-PPN1754996647  |e 3910019307 
BIB |a Y 
SER |a journal 
JSO |a {"relHost":[{"origin":[{"publisher":"[Verlag nicht ermittelbar]","dateIssuedDisp":"2013-","dateIssuedKey":"2013","publisherPlace":"Toronto"}],"title":[{"title":"JMIR medical informatics","subtitle":"clinical informatics, decision support for health professionals, electronic health records, and ehealth infrastructures","title_sort":"JMIR medical informatics"}],"id":{"issn":["2291-9694"],"zdb":["2798261-0"],"eki":["802534929"]},"part":{"issue":"2","extent":"13","text":"9(2021), 2, Artikel-ID e22795, Seite 1-13","volume":"9","pages":"1-13","year":"2021"},"physDesc":[{"extent":"Online-Ressource"}],"recId":"802534929","disp":"Adapting Bidirectional Encoder Representations from Transformers (BERT) to assess clinical semantic textual similarity algorithm development and validation studyJMIR medical informatics","titleAlt":[{"title":"JMIR Med Inform"},{"title":"JMI"}],"language":["eng"],"pubHistory":["1.2013 -"],"type":{"bibl":"periodical","media":"Online-Ressource"}}],"note":["Gesehen am 21.07.2021"],"type":{"bibl":"article-journal","media":"Online-Ressource"},"id":{"eki":["1754996647"],"doi":["10.2196/22795"]},"physDesc":[{"extent":"13 S."}],"language":["eng"],"title":[{"title":"Adapting Bidirectional Encoder Representations from Transformers (BERT) to assess clinical semantic textual similarity","title_sort":"Adapting Bidirectional Encoder Representations from Transformers (BERT) to assess clinical semantic textual similarity","subtitle":"algorithm development and validation study"}],"recId":"1754996647","name":{"displayForm":["Klaus Kades, MSc; Jan Sellner, MSc; Gregor Koehler, MSc; Peter M Full, BSc; TY Emmy Lai, MSc; Jens Kleesiek, MD, PhD; Klaus H Maier-Hein, PhD"]},"origin":[{"dateIssuedKey":"2021","dateIssuedDisp":"3.2.2021"}],"person":[{"display":"Kades, Klaus","given":"Klaus","family":"Kades","role":"aut"},{"role":"aut","family":"Sellner","display":"Sellner, Jan","given":"Jan"},{"role":"aut","family":"Köhler","given":"Gregor","display":"Köhler, Gregor"},{"role":"aut","display":"Full, Peter M.","given":"Peter M.","family":"Full"},{"role":"aut","given":"T. Y. Emmy","display":"Lai, T. Y. Emmy","family":"Lai"},{"display":"Kleesiek, Jens Philipp","given":"Jens Philipp","family":"Kleesiek","role":"aut"},{"family":"Maier-Hein","given":"Klaus H.","display":"Maier-Hein, Klaus H.","role":"aut"}]} 
SRT |a KADESKLAUSADAPTINGBI3220