Automated classification of selected data elements from free-text diagnostic reports for clinical research

Objectives: In the Multiple Myeloma clinical registry at Heidelberg University Hospital, most data are extracted from discharge letters. Our aim was to analyze if it is possible to make the manual documentation process more efficient by using methods of natural language processing for multiclass cla...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Löpprich, Martin (VerfasserIn) , Krauss, Felix (VerfasserIn) , Ganzinger, Matthias (VerfasserIn) , Senghas, Karsten (VerfasserIn) , Riezler, Stefan (VerfasserIn) , Knaup-Gregori, Petra (VerfasserIn)
Dokumenttyp: Article (Journal)
Sprache:Englisch
Veröffentlicht: 2016
In: Methods of information in medicine
Year: 2016, Jahrgang: 55, Heft: 4, Pages: 373-380
ISSN:2511-705X
DOI:10.3414/ME15-02-0019
Online-Zugang:Verlag, lizenzpflichtig, Volltext: https://doi.org/10.3414/ME15-02-0019
Verlag, lizenzpflichtig, Volltext: http://www.thieme-connect.de/DOI/DOI?10.3414/ME15-02-0019
Volltext
Verfasserangaben:Martin Löpprich, Felix Krauss, Matthias Ganzinger, Karsten Senghas, Stefan Riezler, Petra Knaup

MARC

LEADER 00000caa a2200000 c 4500
001 169846374X
003 DE-627
005 20240528111948.0
007 cr uuu---uuuuu
008 200518s2016 xx |||||o 00| ||eng c
024 7 |a 10.3414/ME15-02-0019  |2 doi 
035 |a (DE-627)169846374X 
035 |a (DE-599)KXP169846374X 
035 |a (OCoLC)1341325681 
040 |a DE-627  |b ger  |c DE-627  |e rda 
041 |a eng 
084 |a 33  |2 sdnb 
100 1 |a Löpprich, Martin  |d 1985-  |e VerfasserIn  |0 (DE-588)1135283591  |0 (DE-627)890242518  |0 (DE-576)453561241  |4 aut 
245 1 0 |a Automated classification of selected data elements from free-text diagnostic reports for clinical research  |c Martin Löpprich, Felix Krauss, Matthias Ganzinger, Karsten Senghas, Stefan Riezler, Petra Knaup 
264 1 |c 2016 
300 |a 8 
336 |a Text  |b txt  |2 rdacontent 
337 |a Computermedien  |b c  |2 rdamedia 
338 |a Online-Ressource  |b cr  |2 rdacarrier 
500 |a Online veröffentlicht: 8. Januar 2018 
500 |a Gesehen am 18.05.2020 
520 |a Objectives: In the Multiple Myeloma clinical registry at Heidelberg University Hospital, most data are extracted from discharge letters. Our aim was to analyze if it is possible to make the manual documentation process more efficient by using methods of natural language processing for multiclass classification of free-text diagnostic reports to automatically document the diagnosis and state of disease of myeloma patients. The first objective was to create a corpus consisting of free-text diagnosis paragraphs of patients with multiple myeloma from German diagnostic reports, and its manual annotation of relevant data elements by documentation specialists. The second objective was to construct and evaluate a framework using different NLP methods to enable automatic multiclass classification of relevant data elements from free-text diagnostic reports. - - Methods: The main diagnoses paragraph was extracted from the clinical report of one third randomly selected patients of the multiple myeloma research database from Heidelberg University Hospital (in total 737 selected patients). An EDC system was setup and two data entry specialists performed independently a manual documentation of at least nine specific data elements for multiple myeloma characterization. Both data entries were compared and assessed by a third specialist and an annotated text corpus was created. A framework was constructed, consisting of a self-developed package to split multiple diagnosis sequences into several subsequences, four different preprocessing steps to normalize the input data and two classifiers: a maximum entropy classifier (MEC) and a support vector machine (SVM). In total 15 different pipelines were examined and assessed by a ten-fold cross-validation, reiterated 100 times. For quality indication the average error rate and the average F1-score were conducted. For significance testing the approximate randomization test was used. - - Results: The created annotated corpus consists of 737 different diagnoses paragraphs with a total number of 865 coded diagnosis. The dataset is publicly available in the supplementary online files for training and testing of further NLP methods. Both classifiers showed low average error rates (MEC: 1.05; SVM: 0.84) and high F1-scores (MEC: 0.89; SVM: 0.92). However the results varied widely depending on the classified data ele -ment. Preprocessing methods increased this effect and had significant impact on the classification, both positive and negative. The automatic diagnosis splitter increased the average error rate significantly, even if the F1-score decreased only slightly. - - Conclusions: The low average error rates and high average F1-scores of each pipeline demonstrate the suitability of the investigated NPL methods. However, it was also shown that there is no best practice for an automatic classification of data elements from free-text diagnostic reports. 
650 4 |a classification 
650 4 |a Medical informatics 
650 4 |a medical writing 
650 4 |a multiple myeloma 
650 4 |a natural language processing 
700 1 |a Krauss, Felix  |e VerfasserIn  |0 (DE-588)1210441705  |0 (DE-627)1698466684  |4 aut 
700 1 |a Ganzinger, Matthias  |d 1974-  |e VerfasserIn  |0 (DE-588)102301890X  |0 (DE-627)717346471  |0 (DE-576)366268082  |4 aut 
700 1 |a Senghas, Karsten  |e VerfasserIn  |0 (DE-588)1210442388  |0 (DE-627)1698468938  |4 aut 
700 1 |a Riezler, Stefan  |e VerfasserIn  |0 (DE-588)1033925454  |0 (DE-627)743677528  |0 (DE-576)381607615  |4 aut 
700 1 |a Knaup-Gregori, Petra  |e VerfasserIn  |0 (DE-588)1032766328  |0 (DE-627)739278371  |0 (DE-576)380272814  |4 aut 
773 0 8 |i Enthalten in  |t Methods of information in medicine  |d Stuttgart : Thieme, 1962  |g 55(2016), 4, Seite 373-380  |h Online-Ressource  |w (DE-627)324822243  |w (DE-600)2030773-1  |w (DE-576)098546341  |x 2511-705X  |7 nnas  |a Automated classification of selected data elements from free-text diagnostic reports for clinical research 
773 1 8 |g volume:55  |g year:2016  |g number:4  |g pages:373-380  |g extent:8  |a Automated classification of selected data elements from free-text diagnostic reports for clinical research 
856 4 0 |u https://doi.org/10.3414/ME15-02-0019  |x Verlag  |x Resolving-System  |z lizenzpflichtig  |3 Volltext 
856 4 0 |u http://www.thieme-connect.de/DOI/DOI?10.3414/ME15-02-0019  |x Verlag  |z lizenzpflichtig  |3 Volltext 
951 |a AR 
992 |a 20200518 
993 |a Article 
994 |a 2016 
998 |g 1032766328  |a Knaup-Gregori, Petra  |m 1032766328:Knaup-Gregori, Petra  |d 910000  |d 999701  |d 50000  |e 910000PK1032766328  |e 999701PK1032766328  |e 50000PK1032766328  |k 0/910000/  |k 1/910000/999701/  |k 0/50000/  |p 6  |y j 
998 |g 1033925454  |a Riezler, Stefan  |m 1033925454:Riezler, Stefan  |d 90000  |d 90500  |e 90000PR1033925454  |e 90500PR1033925454  |k 0/90000/  |k 1/90000/90500/  |p 5 
998 |g 1210442388  |a Senghas, Karsten  |m 1210442388:Senghas, Karsten  |p 4 
998 |g 102301890X  |a Ganzinger, Matthias  |m 102301890X:Ganzinger, Matthias  |d 910000  |d 999701  |e 910000PG102301890X  |e 999701PG102301890X  |k 0/910000/  |k 1/910000/999701/  |p 3 
998 |g 1210441705  |a Krauss, Felix  |m 1210441705:Krauss, Felix  |p 2 
998 |g 1135283591  |a Löpprich, Martin  |m 1135283591:Löpprich, Martin  |d 910000  |d 910100  |e 910000PL1135283591  |e 910100PL1135283591  |k 0/910000/  |k 1/910000/910100/  |p 1  |x j 
999 |a KXP-PPN169846374X  |e 3668216657 
BIB |a Y 
SER |a journal 
JSO |a {"relHost":[{"part":{"text":"55(2016), 4, Seite 373-380","extent":"8","volume":"55","pages":"373-380","year":"2016","issue":"4"},"origin":[{"publisher":"Thieme ; Nacke ; Schattauer","dateIssuedDisp":"[1962]-","publisherPlace":"Stuttgart ; Bielefeld ; Stuttgart"}],"id":{"zdb":["2030773-1"],"issn":["2511-705X"],"eki":["324822243"],"doi":["10.1055/s-00035037"]},"disp":"Automated classification of selected data elements from free-text diagnostic reports for clinical researchMethods of information in medicine","recId":"324822243","title":[{"title_sort":"Methods of information in medicine","title":"Methods of information in medicine"}],"language":["eng"],"pubHistory":["Vol. 1, issue 1 (1962)-"],"physDesc":[{"extent":"Online-Ressource"}],"note":["Gesehen am 23.06.2018"],"type":{"bibl":"periodical","media":"Online-Ressource"}}],"note":["Online veröffentlicht: 8. Januar 2018","Gesehen am 18.05.2020"],"recId":"169846374X","name":{"displayForm":["Martin Löpprich, Felix Krauss, Matthias Ganzinger, Karsten Senghas, Stefan Riezler, Petra Knaup"]},"origin":[{"dateIssuedKey":"2016","dateIssuedDisp":"2016"}],"id":{"eki":["169846374X"],"doi":["10.3414/ME15-02-0019"]},"physDesc":[{"extent":"8 S."}],"language":["eng"],"type":{"bibl":"article-journal","media":"Online-Ressource"},"title":[{"title_sort":"Automated classification of selected data elements from free-text diagnostic reports for clinical research","title":"Automated classification of selected data elements from free-text diagnostic reports for clinical research"}],"person":[{"display":"Löpprich, Martin","family":"Löpprich","role":"aut","given":"Martin"},{"family":"Krauss","given":"Felix","role":"aut","display":"Krauss, Felix"},{"given":"Matthias","role":"aut","family":"Ganzinger","display":"Ganzinger, Matthias"},{"display":"Senghas, Karsten","family":"Senghas","given":"Karsten","role":"aut"},{"role":"aut","given":"Stefan","family":"Riezler","display":"Riezler, Stefan"},{"display":"Knaup-Gregori, Petra","role":"aut","given":"Petra","family":"Knaup-Gregori"}]} 
SRT |a LOEPPRICHMAUTOMATEDC2016